Optimization Guide
This guide covers optimization strategies for getting the best performance from compressionz.
Build Optimization
Section titled “Build Optimization”Always Use ReleaseFast
Section titled “Always Use ReleaseFast”Performance differs dramatically between debug and release builds:
# Debug build (default)zig build# LZ4: ~500 MB/s
# Release buildzig build -Doptimize=ReleaseFast# LZ4: ~36 GB/s (72× faster!)| Build Mode | LZ4 Speed | Use Case |
|---|---|---|
| Debug | ~500 MB/s | Development |
| ReleaseSafe | ~20 GB/s | Production with checks |
| ReleaseFast | ~36 GB/s | Maximum performance |
| ReleaseSmall | ~25 GB/s | Minimal binary size |
Recommendation
Section titled “Recommendation”- Development: Debug (fast compilation)
- Testing: ReleaseSafe (catches bugs)
- Production: ReleaseFast (maximum speed)
Codec Selection
Section titled “Codec Selection”Speed Priority
Section titled “Speed Priority”// Fastest compressionconst result = try cz.compress(.lz4_raw, data, allocator);
// Fast with self-describing formatconst result = try cz.compress(.snappy, data, allocator);
// Best balance of speed and ratioconst result = try cz.compress(.zstd, data, allocator);Ratio Priority
Section titled “Ratio Priority”// Best ratio for one-time compressionconst result = try cz.compressWithOptions(.brotli, data, allocator, .{ .level = .best,});
// Best ratio with reasonable speedconst result = try cz.compressWithOptions(.zstd, data, allocator, .{ .level = .best,});Use Case Matrix
Section titled “Use Case Matrix”| Scenario | Codec | Level | Throughput |
|---|---|---|---|
| Real-time | LZ4 Raw | default | 36 GB/s |
| Messaging | Snappy | default | 31 GB/s |
| General | Zstd | default | 12 GB/s |
| Archival | Zstd | best | 1.3 GB/s |
| Web assets | Brotli | best | 86 MB/s |
Memory Optimization
Section titled “Memory Optimization”Zero-Copy for Hot Paths
Section titled “Zero-Copy for Hot Paths”Avoid allocation overhead with pre-allocated buffers:
// Standard API (allocates each time)for (items) |item| { const compressed = try cz.compress(.lz4, item, allocator); defer allocator.free(compressed); // Free each iteration try process(compressed);}
// Zero-copy (no allocations)var buffer: [65536]u8 = undefined;for (items) |item| { const compressed = try cz.compressInto(.lz4, item, &buffer, .{}); try process(compressed);}Buffer Reuse Pattern
Section titled “Buffer Reuse Pattern”const Compressor = struct { buffer: []u8,
pub fn init(allocator: std.mem.Allocator, max_input_size: usize) !Compressor { const buffer_size = cz.maxCompressedSize(.lz4, max_input_size); return .{ .buffer = try allocator.alloc(u8, buffer_size), }; }
pub fn compress(self: *Compressor, data: []const u8) ![]u8 { return cz.compressInto(.lz4, data, self.buffer, .{}); }};Arena Allocators for Batches
Section titled “Arena Allocators for Batches”pub fn processBatch(items: []const []const u8, backing: std.mem.Allocator) !void { var arena = std.heap.ArenaAllocator.init(backing); defer arena.deinit(); // One free for all allocations
for (items) |item| { const compressed = try cz.compress(.zstd, item, arena.allocator()); try sendData(compressed); // No individual frees needed }}Compression Level Selection
Section titled “Compression Level Selection”Level Impact by Codec
Section titled “Level Impact by Codec”Zstd:
| Level | Compress | Ratio | Notes |
|---|---|---|---|
| fast | 12 GB/s | 99.9% | Recommended |
| default | 12 GB/s | 99.9% | Same as fast |
| best | 1.3 GB/s | 99.9% | 9× slower, marginal gain |
Brotli:
| Level | Compress | Ratio | Notes |
|---|---|---|---|
| fast | 1.3 GB/s | 99.9% | Dynamic content |
| default | 1.3 GB/s | 99.9% | Same as fast |
| best | 86 MB/s | 99.9%+ | Only for static content |
Recommendation
Section titled “Recommendation”Use .default unless you have a specific reason:
.fastrarely helps (often same as default).besthas diminishing returns for most data
Streaming Optimization
Section titled “Streaming Optimization”Chunk Size
Section titled “Chunk Size”Larger chunks = better throughput, more memory:
// Small chunks (more overhead)var buf: [4096]u8 = undefined;
// Large chunks (better throughput)var buf: [65536]u8 = undefined; // Recommended
// Very large (diminishing returns)var buf: [1048576]u8 = undefined;Pipeline Pattern
Section titled “Pipeline Pattern”Process data as it arrives:
pub fn streamProcess(input: anytype, output: anytype, allocator: std.mem.Allocator) !void { var decomp = try cz.decompressor(.gzip, allocator, input); defer decomp.deinit();
var comp = try cz.compressor(.zstd, allocator, output, .{}); defer comp.deinit();
var buf: [65536]u8 = undefined; while (true) { const n = try decomp.reader().read(&buf); if (n == 0) break; try comp.writer().writeAll(buf[0..n]); } try comp.finish();}Dictionary Optimization
Section titled “Dictionary Optimization”When to Use Dictionaries
Section titled “When to Use Dictionaries”| Data Size | Without Dict | With Dict | Use Dict? |
|---|---|---|---|
| 100 B | 105 B | 45 B | ✅ Yes |
| 1 KB | 780 B | 380 B | ✅ Yes |
| 10 KB | 3 KB | 1.9 KB | ✅ Yes |
| 100 KB | 28 KB | 24 KB | Maybe |
| 1 MB | 684 B | 680 B | ❌ No |
Rule of thumb: Use dictionaries for data < 10 KB with known patterns.
Dictionary Size
Section titled “Dictionary Size”| Use Case | Size | Notes |
|---|---|---|
| JSON APIs | 16-32 KB | Common field names |
| Log messages | 32-64 KB | Common log patterns |
| Protocol buffers | 8-16 KB | Schema patterns |
Larger dictionaries have diminishing returns.
Parallelization
Section titled “Parallelization”Independent Data
Section titled “Independent Data”Compress multiple items in parallel:
const std = @import("std");const cz = @import("compressionz");
pub fn compressParallel(items: []const []const u8, allocator: std.mem.Allocator) ![][]u8 { const results = try allocator.alloc([]u8, items.len);
var pool = std.Thread.Pool.init(.{ .allocator = allocator }); defer pool.deinit();
for (items, 0..) |item, i| { try pool.spawn(compressOne, .{ item, allocator, &results[i] }); }
pool.waitForAll(); return results;}
fn compressOne(item: []const u8, allocator: std.mem.Allocator, result: *[]u8) void { result.* = cz.compress(.zstd, item, allocator) catch unreachable;}Large Single File
Section titled “Large Single File”Split into chunks:
pub fn compressLargeFile(data: []const u8, chunk_size: usize, allocator: std.mem.Allocator) ![][]u8 { const num_chunks = (data.len + chunk_size - 1) / chunk_size; const chunks = try allocator.alloc([]u8, num_chunks);
// Compress chunks in parallel...}Benchmarking Your Data
Section titled “Benchmarking Your Data”Test with your actual data:
const std = @import("std");const cz = @import("compressionz");
pub fn benchmark(data: []const u8, allocator: std.mem.Allocator) !void { const codecs = [_]cz.Codec{ .lz4, .snappy, .zstd, .gzip, .brotli };
std.debug.print("Input size: {d} bytes\n\n", .{data.len}); std.debug.print("{s:<12} {s:>10} {s:>10} {s:>10}\n", .{ "Codec", "Size", "Compress", "Decompress", });
inline for (codecs) |codec| { var timer = try std.time.Timer.start();
const compressed = try cz.compress(codec, data, allocator); const compress_ns = timer.read();
timer.reset(); const decompressed = try cz.decompress(codec, compressed, allocator); const decompress_ns = timer.read();
allocator.free(compressed); allocator.free(decompressed);
std.debug.print("{s:<12} {d:>10} {d:>9}µs {d:>9}µs\n", .{ codec.name(), compressed.len, compress_ns / 1000, decompress_ns / 1000, }); }}Common Pitfalls
Section titled “Common Pitfalls”1. Debug Builds in Production
Section titled “1. Debug Builds in Production”// Wrong: 72× slower$ zig build && ./app
// Right: Full speed$ zig build -Doptimize=ReleaseFast && ./app2. Over-Compressing
Section titled “2. Over-Compressing”// Wrong: Compressing already compressed dataconst gzip_data = try cz.compress(.gzip, image_data, allocator);const zstd_data = try cz.compress(.zstd, gzip_data, allocator); // Waste of CPU!
// Right: Compress onceconst compressed = try cz.compress(.zstd, raw_data, allocator);3. Wrong Codec for Use Case
Section titled “3. Wrong Codec for Use Case”// Wrong: Brotli best for real-time dataconst compressed = try cz.compressWithOptions(.brotli, message, allocator, .{ .level = .best, // 86 MB/s is too slow for real-time!});
// Right: Use LZ4 or Snappy for real-timeconst compressed = try cz.compress(.lz4_raw, message, allocator);4. Allocating in Hot Loops
Section titled “4. Allocating in Hot Loops”// Wrong: Allocation per iterationwhile (hasData()) { const compressed = try cz.compress(.lz4, getData(), allocator); defer allocator.free(compressed); try send(compressed);}
// Right: Reuse buffervar buffer: [65536]u8 = undefined;while (hasData()) { const compressed = try cz.compressInto(.lz4, getData(), &buffer, .{}); try send(compressed);}Summary
Section titled “Summary”- Use ReleaseFast for production
- Choose the right codec for your use case
- Use
.defaultlevel unless you have specific needs - Reuse buffers in hot paths
- Use dictionaries for small, structured data
- Benchmark with your actual data