DDR3 Clock Rate and Performance – Exhaustive Testing Results from Xbit Labs
Memory vendors offer four speed grades for DDR3 memory: DDR3-800, DDR3-1067, DDR3-1333, and DDR3-1600. Ever wonder how much affect SDRAM clock rate has on performance? Well, so did Ilya Gavrichenkov, the Hardware Editor at and co-founder of Xbit Laboratories. He’s just published an article titled “Choosing DDR3 SDRAM for LGA1156 Platform” with exhaustive tests on various speed grades of DDR3 SDRAM on an Intel Core i7-based platform. Now Xbit Labs has an overclocker orientation and is not really focused on server design but the results are instructive nevertheless.
Here’s the testbed that Gavrichenkov used for his tests:
- CPU: Intel Core i7-860 (Lynnfield, 2.80 GHz, 4 x 256 KBL2, 8 MB L3)
- Mainboard: ASUS P7P55D Premium (LGA1156, Intel P55 Express)
- System memory: 2 x 2 GB, DDR3-1600 SDRAM (Kingston HyperX KHX1600C8D3K2/4GX, Corsair Dominator CMD4GX3M2A1600C8)
- Graphics card: ATI Radeon HD 5870
- HDD: Western Digital VelociRaptor WD3000HLFS
- PSU: Tagan TG880-U33II (880 W)
- OS: Microsoft Windows 7 Ultimate x64
Using memory-intensive synthetic benchmarks, Gavrichenkov did observe some performance differences between DDR3-1067 and DDR3-1600 SDRAM. Although the DDR3-1067 clock and transfer rates are 37% slower than for DDR3-1600, Gavrichenkov observed a maximum of 18% performance difference between the two DDR3 SDRAM speed grades. Results from a multi-threaded synthetic benchmark called MaxMEM2 showed that DDR3-1600 SDRAM gave a maximum of 40% more performance than DDR3-1067 SDRAM, suggesting that multithreaded processor operations get more benefit from faster SDRAM transfer rates. Published results for non-synthetic video-transcoding, X.264 video-decoding, and file-compression benchmarks seem to verify the synthetic benchmark results, at least qualitatively. The Intel Core i7 processor does get some benefit from the faster SDRAM in the benchmarks based on real-world applications.
To some extent, these results should not surprise anyone familiar with the Intel Core i7 multicore processor architecture. The chip carries four multithreaded processors cores. Each processor core has private L1 and L2 caches and all four cores share a large, 8-Mbyte, 16-way associative L3 cache called the “last-level cache” or LLC. There are three SDRAM channels run by an on-chip Integrated Memory Controller (IMC), which manages the traffic between the LLC and the attached SDRAM.
The LLC serves as a huge buffer between the Core i7 processor’s multiple processor cores and the SDRAM channels and it makes sense that the LLC can damp down the performance differences among DDR3 SDRAM speed grades for single-threaded environments. That’s what a good cache does. It also makes sense that the buffering job gets harder when multithreading is involved because the memory accesses become less correlated and therefore too messy to cleanly cache.
Do the Xbit Labs’ results hold true in a server environment? Good question. The results suggest that server designers ought to be running tests of their own. At least for servers based on the Nehalem architecture (Intel’s Core i7, Core i5, Core i3, and Xeon processors), that big on-chip LLC could represent significant savings with respect to memory costs.