Turbo Quant Doesn't Impact DIMM Count
If compression doesn't cross a DIMM boundary, it has zero hardware impact
The Market Overreaction
Google's TurboQuant has triggered a sharp reaction across memory markets, driven by a headline claim of up to 6x memory reduction with no loss in accuracy.
However, this narrative misses two critical facts:
1. TurboQuant compresses KV cache only - not total system memory.
2. Even large percentage reductions do not translate into reduced hardware purchases unless they eliminate DIMMs.
What KV Cache Actually Is
KV cache is not abstract - it is real, physical memory:
• Stored in GPU HBM or system DRAM.
• Used for fast access during inference.
• Cannot be offloaded to storage in live inference because SSD and NAND are too slow.
Depending on workload, KV cache represents:
• 10-30% of memory in smaller workloads.
• 30-60% in...