“Driven by Moore’s Law, integrated circuit technology has achieved rapid development, and the number of transistors per unit area has continued to increase. System-on-Chip (System-on-Chip, SoC) has the advantages of high integration, low power consumption, and low cost. It has become the mainstream direction of large-scale integrated circuit system design, which solves the problem of communication, imaging, computing, consumer electronics and other fields. Many challenging problems. As the application requirements of System-on-Chip SoC become more and more abundant, SoC needs to integrate more and more IP (Intellectual Property) of different applications. In addition, the on-chip multi-core system MPSoC (MultiProcessor-System-on-Chip)
Driven by Moore’s Law, integrated circuit technology has achieved rapid development, and the number of transistors per unit area has continued to increase. System-on-Chip (System-on-Chip, SoC) has the advantages of high integration, low power consumption, and low cost. It has become the mainstream direction of large-scale integrated circuit system design, which solves the problem of communication, imaging, computing, consumer electronics and other fields. Many challenging problems. As the application requirements of System-on-Chip SoC become more and more abundant, SoC needs to integrate more and more IP (Intellectual Property) of different applications. In addition, the on-chip multi-core system MPSoC (MultiProcessor-System-on-Chip) has also become an inevitable development trend.
With the high integration of SoC and the rapid development of MPSoC, higher requirements are put forward for on-chip communication. Network-on-Chip (NoC) technology has also been greatly applied at this time. It essentially provides an on-chip communication solution for data transmission between different IPs or different cores in the chip.
The network-on-chip technology has a history of more than 20 years since its invention, and it has been widely used in SoC. In view of the high-bandwidth and low-latency characteristics of the on-chip network, mainstream FPGA companies have also begun to consider using NoC in high-end FPGAs to address the high-bandwidth requirements of data transmission. Achronix’s new generation of 7nm process Speedster 7t is one of the first high-end FPGAs to integrate NoC. As shown in Figure 1.
Figure 1 Speedster 7t FPGA structure diagram
2. The development of on-chip interconnect architecture
The development of the on-chip interconnect architecture has mainly gone through three stages: shared bus (Bus), Crossbar, and network on chip (NoC).
(1) The traditional SoC on-chip communication structure generally uses a shared bus. In the shared bus structure, all processors and IP modules share one or more buses. When multiple processors access a bus at the same time, an arbitration mechanism is needed to determine the ownership of the bus. The structure of the on-chip communication system on the shared bus is generally relatively simple, and the hardware cost is also small. But the bandwidth is limited, and the bandwidth cannot be expanded with the increase of IP. In 1996, ARM’s AMBA bus was widely used in the on-chip bus of embedded microprocessors, and it has now become a de facto industry standard.
Figure 2 Typical AMBA bus system
(2) For the traditional shared bus, when multiple processors access different IPs at the same time, because an arbitration mechanism is needed to determine the ownership of the bus, the traditional bus method will cause a certain bottleneck in this case. The biggest problem is the delay of access. Under this circumstance, in order to meet the demand for simultaneous access of multiple processing and increase the bandwidth of the entire system, a new solution Crossbar was born, as shown in Figure 3, a typical Crossbar structure.
Crossbar guarantees the real-time performance of multiple communications at the same time. As long as it is not accessing the same target device, arbitration is not required, which greatly reduces the bottleneck caused by arbitration. But with the increase in the number of equipment, the scale of Crossbar will grow exponentially. So usually we use bridge devices to cascade multiple Crossbars to support device expansion. But the bridge device may become the bottleneck of the system and increase the transmission delay.
Figure 3 Typical unidirectional 8×8 Crossbar
In practical applications, we usually also use a combination of Crossbar and shared bus, and use a bridge to connect the Crossbar network and the shared bus network, as shown in Figure 4 as a typical hybrid topology.
Figure 4 Typical mixed topology network
(3) The network-on-chip NoC brings a new method of on-chip communication, which is significantly better than the traditional bus and Crossbar performance. NoC is a more scalable design. In the NoC architecture, each module is connected to an on-chip router, and the data transmitted by the module forms a data packet, which is sent to the target module of the data packet through the router. Figure 5 shows a typical NoC structure. R in the figure represents Router. All Routers can be synchronous, but the PE (Processing Element) connected to each Router is asynchronous with the Router, forming a clock domain (Clock Domain). ). Therefore, NoC-based systems can better adapt to the global asynchronous local synchronous clock mechanism used in complex multi-core SoC designs. In addition, NoC can support various extended functions, such as flow control, quality of service (QoS), and so on. Therefore, NoC is the best interconnection mechanism for multi-core systems.
Figure 5 Typical NoC structure of the chip network
3. Application of NoC in high-end FPGA
FPGAs are playing an increasingly important role in the increasing demand for data acceleration. In order to meet the needs of various high-performance applications in cloud computing and edge computing, FPGA as a programmable and customizable high-performance device has gradually become a fast way to deploy high-throughput data acceleration. But at the same time, these high-performance acceleration applications also place higher requirements on high-end FPGAs, such as high computing power, high-bandwidth data transmission, and high-bandwidth memory.
The network-on-chip technology has been widely used in SoC, and has achieved relatively good results. It has only been slowly used in FPGAs in recent years. Achronix has created a Speedster 7t FPGA chip that can maximize system throughput, and innovatively applies two-dimensional network-on-chip (2D NoC) to FPGAs, which can be used in logic arrays. High-speed data transmission between the processing unit and various on-chip high-speed interfaces and memory interfaces, which truly maximizes the throughput of data-intensive applications. FPGAs with network-on-chips are even more powerful, bringing many advantages that traditional FPGAs can’t match, and they are bound to play a huge role in various data acceleration applications.
4. Advantages of NoC to Speedster 7t FPGA
Achronix Speedster 7t FPGA has a SerDes that supports a single-channel rate of 112Gbps, a 400G Ethernet MAC, a PCIe GEN5 controller, and a GDDR6 controller with a bandwidth of up to 4Tbps, providing high-bandwidth I/O interfaces and high bandwidth for various data acceleration applications Memory. In this type of application, a large amount of data will enter the FPGA for processing, and the processed data will be output through the FPGA. Therefore, in addition to the FPGA computing power, the data movement speed directly determines the performance of the device and the user experience. In order to improve the data transmission rate, Achronix specially designed a network on chip in Speedster 7t FPGA which is different from the traditional FPGA data movement channel. As shown in Figure 6. This is an innovative, high-bandwidth two-dimensional network-on-chip (2D NoC) that can span and vertically span FPGA logic arrays. It can not only be connected to all FPGA high-speed interfaces and high-bandwidth memory interfaces, but also can be used as one of the internal logic. The interconnection between.
Figure 6 Speedster 7t Network on Chip (NoC) structure
The two-dimensional network-on-chip (2D NoC) on Speedster 7t FPGA is not built by programmable logic, but implemented by solidified ASIC logic. The fixed operating frequency is 2GHz, and each row or column of NoC can be used as two unidirectional 256-bit NoCs. A bidirectional path is realized, so each direction can provide 512Gbps bandwidth, and the total network bandwidth can reach 27Tbps.
The following table lists the characteristics of NoC in Speedster 7t FPGA.
Table 1 NoC characteristics in Speedster 7t FPGA
NoC operating frequency
Protocols supported by NoC
1. AXI protocol (256bit) 2. Ethernet packet format (256bit) 3. Original data format transmission (288bit)
NoC access point NAP
80 masters, 80 slaves
Increase by 1ns or 1.5ns after each NAP
NoC provides the following important advantages for FPGA:
(1) Significantly improve the design performance and solve the performance bottleneck of some high-performance applications such as 400G Ethernet: usually after the data stream is unpacked by the 400G Ethernet MAC, it will have an ultra-high bit width and need to run at a very high frequency. It is impossible to achieve in the traditional FPGA logic unit, and NoC can solve the performance bottleneck. We will explain in detail in a follow-up article.
(2) NoC is an additional routing resource in addition to traditional programmable logic, so it can reduce the risk of placement and routing congestion in a high-resource design.
(3) NoC includes asynchronous clock conversion, arbitration control and other logic, which can replace traditional logic for high-speed interface and bus management, so using NoC can simplify user design and save some traditional resources (LE, FIFO, wiring, etc.) use.
(4) The NoC part is ASIC solidified logic, and its power consumption is much lower than traditional FPGA programmable logic implementation.
(5) Real modular design can be realized by using NoC. Traditional high-end FPGA design usually requires a team of FPGA engineers to complete. Each engineer designs his own module, debugs and verifies his own module in the entire FPGA chip, and then connects each module into a larger complete design. This will be due to resource consumption. Ascent, it usually takes a lot of time to optimize the layout or even modify the design to achieve the target performance. In Achronix Speedster7t, the modules can be interconnected through NOC, and with the help of fixed layout technology after the function performance of a single module is debugged, it can even reach the possibility that the overall design does not need additional joint debugging after NoC interconnection. This can greatly reduce R&D workload and time.