SSD Performance Secrets
Compared to hard disk drives (HDDs), solid-state disks (SSDs) are fast. They’re an order of magnitude faster than the fastest enterprise-class HDDS in write IOPS and two orders of magnitude faster in read IOPS. What’s not to like? Well, just as HDDs deliver variable read/write performance depending on where the read/write arm is currently positioned relative to where it needs to be for the next read/write operation, SSD IOPS performance also varies—but in very complex ways. It’s nothing so simple as having the read/write head be in the wrong place at the wrong time. Although in a sense, that’s exactly what’s happening with SSDs.
Find those last two sentences confusing or contradictory? Here’s the explanation.
SSDs have no read/write heads or positioning arms. Instead, they consist of several NAND Flash chips and a controller chip. There’s an array of memory blocks on each NAND Flash chip. The size of the Flash memory block is the smallest amount of memory a NAND Flash chip can write in one operation because NAND Flash memory blocks are atomic with respect to erasure. You can’t write just one byte or word because you must erase the entire block before writing to the block. That means an SSD can only write an entire NAND block at a time.
Each SSD consists of a stack of visible NAND memory blocks that the SSD controller uses to store written data. There’s also a shorter stack of spare NAND memory blocks that hold data in temporary storage. These spare blocks are also used to replace a visible block when it wears out from repeated write/erase cycles. All NAND blocks are equally accessible, so there’s no time penalty for writing NAND blocks out of sequence as there is when writing on non-adjacent or non-contiguous tracks with HDD storage.
However, most virtual operating systems don’t write in blocks, they write in 4-Kbyte pages that are much smaller than NAND Flash blocks. For example, Numonyx’ 1-to-16-Gbit NAND Flash devices have 128-Kbyte blocks. As a result, modifying one 4-Kbyte page in a NAND Flash block requires a relatively complex sequence:
- Read the data for the entire block from NAND Flash into a RAM buffer
- Modify the appropriate page in the block image now stored in RAM
- Write the block back to an erased NAND Flash block
- Fix pointers to the new memory block
- Erase the old memory block as a background task
Consequently, SSD performance varies over time and the performance varies depending on how many erased and spare memory blocks are available across all of the NAND Flash chips in the SSD. SSD performance also depends on the ratio of reads versus writes—because reads occur ten times faster than writes for SSDS—and they vary over time as the NAND Flash chips fill up.
The following figure from Handy’s keynote shows a 3D data surface plot representing the IOPS performance of one SSD. (The figure is from a presentation at the August 2009 Flash Memory Summit made by Esther Spanjer, Director of SSD Marketing at Smart Modular Technologies.)
The X axis of the surface shows the ratio of reads to writes and varies from 100% writes on the left to 100% reads on the right. The Y axis shows SSD performance in IOPS. The Z axis plots “block” size, from the SSD-level perspective (which is page size from the NAND Flash chip’s perspective, yes that’s confusing).
The first thing to note from this surface plot is that performance is a lot better on the right-hand side, which is dominated by reads. You’d expect that because SSD read performance is 10x better than SSD write performance. It’s the nature of NAND Flash memory. Note how fast the performance falls off as the percentage of write transactions increases. Then note that there’s a sort of saddle effect along the Z axis. The saddle peaks at 4-Kbyte blocks. Most SSD designs are optimized for 4-Kbyte blocks because most virtual operating systems employ 4-Kbyte blocks (and have for decades, in spite of the radical, orders-of-magnitude increase in memory use by both operating systems and application software).
So, clearly, when an SSD vendor gives an IOPS rating for an SSD, you need to take that one number with a grain of salt. SSD performance varies significantly depending on the read/write mix and on block size. Consequently, SSD performance can’t be captured in one or two numbers.
Next, Handy presented this graphic from SandForce (which makes SSD controller chips):
This graph shows an initial conditioning period during which the test preconditions (fills up) the SSD using sequential 128-Kbyte writes. The initial transfer performance (about 80 Mbytes/sec for the particular drive being tested) drops slightly as the drive fills and the internal SSD controller starts shuffling full NAND Flash blocks off to spare memory. The falloff isn’t big because the sequential writes place a predictable load on the SSD controller. However, when the test switches to random 4-Kbyte writes about 4000 seconds into the test, performance drops significantly because the SSD controller suddenly needs to make small changes to memory stored in the NAND Flash blocks but the drive’s full and there are no empty blocks. Blocks must be erased to make room for the new data and block erasure takes time. Consequently, there’s a big performance falloff as the controller starts to shuffle data around inside of the drive to make room for new data.
Perhaps more interesting is what happens when the test switches back to large sequential writes about 11,000 seconds into the test. Initially, the sequential writes cause the drive performance to vary wildly because the preceding random writes have scattered the spare blocks and left them distributed throughout the SSD’s internal NAND Flash memory space. Eventually, the SSD’s internal controller gets things sorted out and the performance for large sequential writes returns to the initial steady-state level.
(Note: This graph is not supposed to typify the performance of all SSDs. The graph shows the results of a test on one particular SSD.)
So what’s to be learned from all of this data? SSD performance measurement isn’t simple. Creating controllers and firmware that deliver optimum SSD performance isn’t simple either. As drive and chip vendors learn more about the use of NAND Flash for storage, they develop better algorithms for extracting more performance from the NAND Flash chips.
NAND Flash chips are complicated, whether used in SSDs or for server memory backup as with AgigA Tech’s AGIGARAM modules. It takes experience to get the most performance from these memory devices.
My thanks to Jim Handy for all of the great information in his Bell Micro keynote, and for generously letting me use the information in this series of blog entries.