How to achieve the lowest failure rate in a NAND-based flash system?

Last Update Time: 2020-09-24 10:50:47

As the process geometry of NAND flash memory shrinks, the bit error rate continues to increase, resulting in a reduction in system failure rates. Anyone who understands the basics of SD cards, USB flash drives, and other NAND flash-based solutions knows that the key component that controls these minimized failure rates is the NAND flash controller. You may be familiar with this component and discussed the error correction code (ECC) strength. Have you ever wondered what exactly appears in this small package? What does the flash controller do to prevent failures? ECC is a unit in a set of different building blocks. The system design is quite good, and its reliability and error prevention features are staggered throughout the process array, including ECC for unavoidable bit errors.

Even before the system is assembled, both internally and through the system integrator, there is an important planning standard for flash certification. In other words, the flash controller should be paired with the correct flash strategy. So what exactly does eligibility mean? Qualification does not only mean that the controller will use the selected flash memory. Most importantly, it means testing, and not just a few. At WorldPay, we ensure that the portfolio has been thoroughly tested. The first is to characterize the flash memory itself. Characterization is done through extensive testing of NAND flash memory in all life cycle stages with different use cases. This knowledge helps to correctly design the error correction unit, extract the soft decoding log-likelihood ratio (LLR) table for error correction, and implement the most effective overall error recovery process.

When planning and designing, most companies discuss flash memory in relation to overall cost, but many people forget to consider the behavior of flash memory because of their architecture, environment, and use cases they are exposed to. Each scenario requires unique processing, correction, and recovery options for best results. This characterization activity is important because all of the data collected can validate the tool in the most accurate and efficient way. Complex and well-thought-out qualification is the foundation of a robust and stable system. For demanding systems, it's worth asking and discussing the qualification process with the system integrator. Or, if you design the solution in-house for more flexibility, consult the controller company directly. Although a reliable certification sets up a successful system, calibration and controller functions are more like direct error prevention.

An effective calibration process can maintain a low bit error rate throughout the device's lifetime, while dynamically adapting to changes in threshold voltage in the memory cell. There are many disturbances that can affect the battery's threshold voltage: program and erase cycles, read disturbances, data retention temperature changes, etc. Flash does not automatically track threshold changes. Instead, the flash memory controller determines when it needs to calibrate and execute the appropriate sequence of operations.

As described below, the calibration changes the battery's reference voltage. Because different blocks or pages may experience different disturbances, the best alignment of one page may not apply to another page.

In addition, error prevention mechanisms such as wear leveling, read interference management, near missed ECC, and dynamic data refresh work together to manage the efficient and reliable transfer of data to flash memory. Wear leveling ensures that all blocks in a flash or storage system are close to their defined erase cycle budget at the same time, rather than some of the blocks previously approaching it. All data read by the near miss ECC refresh application exceeds the configured error threshold, while the dynamic data refresh scan reads all data and identifies the error status of all blocks as a background operation. These functions are usually named by different controller companies in different ways, and finally aim at the logic and algorithms behind them, while at the same time aiming at a common goal and reaching it in different ways. People should build a close Relationship to understand how these features work with qualified flashes.

Finally, error correction has become one of the most famous and important tasks in flash memory controllers, and error prevention should take more weight in its value, and the complexity and intensity of error correction ultimately make it the most valuable cake Controller mechanism. When considering area and power constraints, error correction coding becomes more and more difficult. As the need for error correction capabilities continues to grow, old code can no longer provide the required correction performance based on the limited spare area available in the latest flash memory.

In order to provide the best solution, HPS has developed its own error correction engine, which is a hard decision and soft decision error correction module based on generalized concatenated codes. The great advantage this code construction offers is in one particular aspect: the number of correctable errors in each codeword can be analytically determined. This means that for each codeword, error correction can guarantee a certain degree of correction performance. For all available flash memories, a guaranteed bit error rate is specified to ensure reliable operation within the specified parameters.

When the data is read back from the flash memory and passed to the error correction module, it is determined which bit errors are based only on the redundant information added to the codeword. Using only this information means that it is equally possible for each bit to be correct or incorrect. Probability is considered using so-called soft information, which indicates the likelihood that the received bit is the received bit or whether it is another value. These probabilities are taken from so-called log-likelihood tables, which have been generated and stored in lookup tables in the controller. Using this information, error correction now has more inputs: for each individual bit, the probability information now indicates the likelihood that the bit will be received, for example, a zero is received with 74% confidence and the original value is zero. Error correction has a clear indication of which bits may be wrong and which bits are less likely to be wrong. This additional information significantly increases the ability to correct errors.

The flash memory controller is a key component to ensure reliable and secure processing of flash memory. They handle a range of functions designed to efficiently manage data transfers on flash memory and not only perform error correction, but also prevent errors. However, these features are designed differently, and depending on the company's business model and focus, your controller can be minimal.

If you want to know more, our website has product specifications for the NAND-based flash system， you can go to ALLICDATA ELECTRONICS LIMITED to get more information

<- Previous Next ->