IP and Chiplets for PCIe Gen6 and Gen7

The PCI Express® (PCIe) standard is a high-speed serial bus protocol that facilitates exceptionally high-speed point-to-point communication between the CPUs and devices in AI, autonomous vehicles, and other advanced applications requiring low latency and high reliability.

With a data rate of 64 GT/s, its latest generation, PCIe 6.0 has a total combined bandwidth over 16 lanes of 128 GBps and the upcoming PCIe 7.0, due to publish in 2025, is set to double this to 128 GT/s, giving 256 GBps over 16 lanes, with a bi-directional bandwidth of 512 GBps. These protocols also implement enhanced modulation technologies to deliver this data, with PCIe 6.0 utilizing Pulse Amplitude Modulation with 4 levels (PAM4) to double data rates. This requires rigorous signal integrity and improved low latency error correction techniques, which are critical for AI workloads and hyperscale data centers.

The protocol is also being implemented via chiplets – modular semiconductor components that are assembled into complex systems – and this further advances PCIe’s application space.

What is the PCIe Standard?

Designed as a point-to-point alternative/replacement for PCI (peripheral component interface), PCI-X and AGP, the PCI Express (PCIe) is a modern serial computer expansion bus protocol that delivers exceptional transfer speeds between peripherals and the CPU.

It is a core transmission standard in AI data centers, but is also widely used for automotive applications that demand large volumes of data to be transmitted at high speeds.

Serial bus

PCIe operates as a high-speed serial bus, with data transmitted sequentially over a single channel per lane.

This approach reduces the number of physical wires, as well as electromagnetic interference (EMI), and thereby increases data integrity in comparison with parallel buses, which transmit multiple bits simultaneously across several channels.

In applications like PCIe and Ethernet in AI data centers, this design enables greater bandwidth as well as increased clock speeds. The serial bus design also helps to enable a greater level of scalability and adaptability, which are proving vital as AI data centers evolve rapidly in their capabilities.

Interconnect

The protocol uses a point-to-point interconnect model for communication between devices.

Under the standard, each peripheral device is connected via dedicated lanes to directly communicate with the CPU and in doing so, gives full bandwidth availability without competing for resources. This allows for low latency, high-speed data transfers and creates the ability to allocate resources dynamically, making it ideal for applications that require quick, efficient communication between components, such as GPUs, SSDs, and network interfaces.

Importantly for multipurpose systems with varying bandwidth requirements, , the standard also enables low-speed, low-bandwidth peripherals to be allocated a single lane, while high-speed, high-bandwidth connections are allocated wider links. For example, the host or root complex can have a 16-lane capable bus. When a low bandwidth peripheral device is manufactured, it might have a single-lane PCIe endpoint. The host can then effortlessly drop to single-lane support to save power. That same host bus can also accept a 16-lane link for a high-end graphics card. As described in the next sections, PCIe flexibly supports multiple lane widths and bandwidths

Lane

Each PCIe lane consists of two differential Rx and Tx signaling pairs. Devices can be connected using multiple lanes – i.e. 1x, 4x, 8x or 16x – with each lane being given a specific amount of bandwidth, which varies according to the PCIe generation and provides significant scalability in its performance.

Bandwidth

In PCIe generation 6.0, bandwidth per lane is 64 GT/s, which equates to 8 GBps and a theoretical maximum bandwidth in a PCIe 6.0 x16 connection of 128 GBps.

PCIe’s next evolution (PCIe 7.0) is set to publish in 2025 and will increase this bandwidth to give a total theoretical maximum bidirectional bandwidth of 512 GBps across 16 lanes.

As such PCIe 6.0 and PCIe 7.0 are being adopted for AI and hyperscale data centers, while older, more generations of the standard, such as PCIe 4.0 and PCIe 5.0 are being adopted by automotive and similar cost-sensitive applications.

PCIe Applications

PCIe is a versatile interface used in a broad range of applications, notably in artificial intelligence (AI) and storage solutions. Its scalability, high bandwidth, and low latency make it indispensable in these fields, enabling efficient data processing and transfer.

The standard is also being adopted strongly by the automotive industry (amongst others) to handle the vast quantities of data being generated by the cameras, Radar, LiDAR and ultrasound sensors in today’s ADAS and autonomous vehicles.

PCIe in AI and hyperscale data centers

In AI and leading-edge data centers, PCIe plays a crucial role in connecting advanced processors and server systems, allowing data to move between key components.

The immense data throughput provided by the latest generations of PCIe ensures that large datasets can be processed swiftly, handing off to memory and storage seamlessly.

With the advent of PCIe 6.0 and the forthcoming PCIe 7.0, AI systems can achieve lower latencies and higher data transfer speeds, thereby improving the performance and efficiency of training models and inferencing tasks.

PCIe in automotive applications

In the automotive industry, PCIe is playing an increasingly important role in enabling advanced driver-assistance systems (ADAS) and autonomous driving technologies to be implemented.

PCIe’s high-speed data transfer capabilities allow for seamless communication, and the standard is particularly essential to relay sensor-fusion information between the core processors used to understand the world around the vehicle based on the data from the cameras, LiDAR and Radar sensors.

Evolution of PCIe – and PCIe 6.0 vs PCIe 7.0

Since its introduction, the PCIe standard has undergone several iterations, each improving upon data transfer rates, bandwidth, and efficiency. The evolution from PCIe 1.0 to the upcoming PCIe 7.0 reflects the growing demand for higher performance in computing systems, with each version doubling the bandwidth of its predecessor.

Version Year Released Bandwidth per Lane (x1) Total Bandwidth (x16) Data Rates
PCIe 1.0 2003 250 MBps 4 GBps 2.5 GT/s
PCIe 2.0 2007 500 MBps 8 GBps 5 GT/s
PCIe 3.0 2010 1 GBps 16 GBps 8 GT/s
PCIe 4.0 2017 2 GBps 32 GBps 16 GT/s
PCIe 5.0 2019 4 GBps 64 GBps 32 GT/s

PCIe 6.0

PCIe 6.0 is particularly targeted towards AI, data centers, and other high-performance computing (HPC) applications. It was published in 2022 and has significant enhancements over its predecessors. With data rates of 64 GT/s, PCIe 6.0 enables a bandwidth of up to 8 GBps per lane.

The key innovation in PCIe 6.0 is the adoption of PAM4 (Pulse Amplitude Modulation with four levels), which doubles the data rate without requiring a proportional increase in clock speed.

PCIe 6.0 also implements Forward Error Correction (FEC), which improves data integrity at these higher speeds and modulations without overburdening the system with latency, and fixed-length data packets (FLITs)s which simplifies data management and improves bandwidth efficiency.

PCIe 7.0

Expected to be published in 2025, PCIe 7.0 will further push the boundaries of data transfer rates to 128 GT/s, potentially achieving up to 16 GBps per lane and 256 GBps in a x16 configuration (512 GBps). Building upon the advancements of PCIe 6.0, it aims to address the growing demands of AI, machine learning, and large-scale cloud computing environments. PCIe 7.0 is anticipated to continue using PAM4 and advanced error correction techniques while optimizing power efficiency to support the next generation of computing workloads.

Scale Up vs Scale Out Protocols

PCIe is a scale-up communication protocol. These focus on increasing the capacity of a single system by adding more resources – i.e. faster interconnects. They emphasize high-performance, low-latency communication within a single node, which facilitate fast internal data transfer.

Other examples of scale-up protocols include UALink, which enables accelerator-to-accelerator coherency, and NVLink, which has been proprietarily developed by NVIDIA. They are optimized for GPU-to-GPU communication and facilitate high-bandwidth communication between high-bandwidth memory and processors within a system.

In contrast, scale-out communication protocols support the addition of more independent nodes to distribute workloads across multiple machines. These protocols focus on networked communication, emphasizing scalability, fault tolerance, and distributed resource management.

Examples of scale-out protocols include Ethernet, specifically Ultra Ethernet from the UEC, and InfiniBand™, which have been adopted for supercomputers as a result of their high throughput and low latency. RDMA allows direct memory access from one computer to another without involving the CPU.

There are of course other examples that can be used for scale-up and scale-out. For example, the complementary-to-PCIe standard CXL® has been designed for high-speed, low-latency CPU-to-device and CPU-to-memory interconnects. It is primarily designed as a scale-up protocol and leverages PCIe’s physical layer as well as allowing resources to be shared within the same node, expanding the internal capacity of systems. CXL 3.0 introduces features like memory pooling across multiple nodes allowing for some scale-out use cases and can be implemented through PCIe-over-optics to enable disaggregated architectures.

PCIe Chiplets and IP

A chiplet is a small, modular piece of a semiconductor integrated circuit (IC) that can be combined with other chiplets to create a more complex, high-performance chip.

An analysis by Meta has shown that 38% of the time data resides in the data center is wasted, sitting in networks. As a result, connectivity and bottlenecking is among the most critical limitations affecting advances in AI. Monolithic ICs have already reached the reticle limit, with the die limited to 858 mm2, meaning SoC size (and critically the perimeter size) cannot expand to incorporate more I/O onto the chip directly without shifting to something such as a chiplet model.

Unlike traditional monolithic chips, where all functions are integrated into a single silicon die, chiplets allow different functional blocks (like CPUs, GPUs, memory controllers, and communication protocols like PCIe) to be fabricated separately and then combined using advanced techniques such as TSMC’s CoWoS® (Chiplet on Wafer on Substrate) or Samsung’s I-Cube™ packaging with interconnection enabled via protocols such as UCIe™ (Universal Chiplet Interconnect Express).

The chiplet approach also offers several advantages, including increased manufacturing yields. It also enables individual chiplets to be developed using different process nodes before combining them to enable cost-effective, mature analog processes to be combined with leading-edge process for digital logic.

This modularity also speeds up development times and reduces costs since companies can reuse existing chiplets across multiple products.

PCIe Chiplet Advances

In Q2 and Q3 2024, Alphawave Semi announced several advances in PCIe technologies, with both IP- and chiplet-based announcements.

This included the industry’s first multi-protocol chiplet for high performance compute and AI infrastructure. This combines Alphawave Semi’s IP portfolio of Ethernet, PCIe 6.0, CXL 3.x and UCIe Revision 1.1. Capable of delivering a total bandwidth of up to 1.6 Tbps, the Alphawave Semi chiplet enables up to 16 lanes of multi-standard PHY and was developed for TSMC’s 7 nm processes.

Additionally, Alphawave Semi demonstrated its silicon-ready PipeCORE PCIe IP for the PCIe 7.0 standard at PCI-SIG DevCon. This demonstration paired with the Tektronix DPO70000 High-Performance Oscilloscope to navigate transmitter performance at 128 GT/s (PAM4).

The company also extended its partnership with Samsung, to bring its PCIe 7.0, 112G / 224G Ethernet, and UCIe chiplet technologies and IP to Samsung’s advanced processes including its 5 nm, 4 nm and 2 nm process nodes.