The promising future of FPGA for image processing will be larger and larger

Last Update Time: 2019-07-15 10:52:22

One of the most important advantages of using FPGAs for image processing is that FPGAs can perform real-time pipeline operations to achieve the highest real-time performance. Therefore, in some applications where real-time requirements are very high, image processing can only be done with FPGAs. For example, in some sorting devices, image processing basically uses FPGAs, because the delay between the camera seeing the material image and giving the execution instruction is only a few milliseconds, which requires image processing to be fast and The delay is fixed, and only the real-time pipeline operation performed by the FPGA can meet this requirement.

Therefore, to understand the advantages of FPGA image processing, it is necessary to understand the real-time pipeline operation that FPGA can perform and the image processing operations performed by DSP and GPU. The processing of images by DSP, GPU and CPU is basically in units of frames. The image data collected from the camera will be stored in the memory first, and then the GPU will read the image data in the memory for processing. If the frame rate of the captured image is 30 frames, then if the DSP and GPU can complete the processing of one frame of image within 1/30 second, it can basically be regarded as real-time processing.


The FPGA performs real-time pipeline operations on images in units of rows. The FPGA can directly connect to the image sensor chip to obtain an image data stream, and if it is in the RAW format, the difference can also be obtained to obtain RGB image data. The key to FPGA real-time pipeline processing is that it can buffer several lines of image data with its internal Block Ram. This Block Ram can be said to be similar to the Cache in the CPU, but the Cache is not completely controllable by you, but the Block Ram is fully controllable and can be used to implement various flexible operations. In this way, the FPGA can process the image in real time by buffering a plurality of lines of image data, and the data is processed as it flows through the side, and does not need to be sent to the DDR buffer before being read and processed.


Such data stream processing is obviously sequential reading of data, then only those algorithms that read data sequentially, that is, a large class of 3x3 to NxN operators in image processing, are used to filter, take edges, Algorithms such as expansion corrosion. Perhaps everyone will think that these operations seem to be the most basic image processing operations, just a front-end preprocessing, it seems to be of little use. But the problem is that only FPGAs do such operations are the fastest and most efficient. For example, using the CPU to do an edge-finding algorithm simply does not reach real-time. In addition, do not underestimate this NxN operator method, it can have a variety of combinations and gameplay, can achieve a variety of colors, and even distinguish between simple shapes and other functions. This operator processing by FPGA is a parallel pipeline algorithm whose delay is fixed, such as processing with a 3x3 operator, and the delay of giving the result is the time of two lines of image. Also this operator is similar to the first convolutional layer operation in convolutional neural networks.

Block Ram in FPGA is an important and scarce resource. The number of rows of image data that can be buffered is limited, so the N in this NxN operator cannot be particularly large. Of course, the FPGA can also be connected to the DDR to buffer the image and read it for processing. However, this processing mode is almost the same as the CPU, and the highest real-time performance is not achieved. In fact, some image processing algorithms that we think need to read data randomly can also be pipelined in parallel.


In intensive computing, operations that delay time and consume power are often not operations themselves, but move data from memory. The GPU, the CPU must take the data out of the memory when performing the calculation, and it is put back. Such memory bandwidth often becomes the bottleneck of the computing speed, and the proportion of power consumption in the data handling process will not be small. FPGA can expand the operations to be done by stacking a lot of computing hardware, and then the data flows through it. After completing the operation of one stage, it flows directly into the second stage, and does not need to send back the data after the completion of one calculation stage. In memory, read it again and hand it to the next stage of the operation. This will save a lot of time and power. Now doing image processing with FPGA is like this. For example, first use a 3x3 operator to filter, then use a 3x3 operator to take the edge. In the FPGA pipeline algorithm, the data is processed immediately after the filter is processed. It does not need to be stored in the memory like the CPU and then read out.

The prospect of FPGA image processing is still quite broad, and more and more industrial applications require higher real-time performance, which is what FPGAs are suitable for. There is also a field of machine learning, such as layered neural