Interconnect, D2D Communication and Compute Chiplets

Chiplets offer a significant reduction in the total die cost by decomposing a large monolithic integrated circuit into smaller, manageable pieces. This modular approach leverages the fabrication of multiple smaller dies, which are typically less expensive to produce than a single large die due to better yields in the manufacturing process.

The cost-effectiveness of chiplets is most pronounced in designs involving large dies with minimal redundancy. In such cases, the production of a single large die can be economically challenging due to lower yields and higher defect rates, which increase the overall cost. By utilizing chiplets, it’s possible to selectively integrate different application-specific functions into a heterogeneous SoC, each running on a process optimized for both performance and cost. This not only optimizes the performance of each individual component but also enhances the overall system efficiency by tailoring each chiplet to its specific application requirements.

Critically, for AI and hyperscale data centers chiplet architectures not only reduce cost, but also enable SoCs to be developed that exceed the reticle limit and implement increased levels of interconnect. To enable this, foundries such as TSMC and Samsung have developed advanced packaging and assembly techniques, for example CoWoS (Chip on Wafer on Silicon), and die-to-die communication protocols, such as the leading open standard, UCIe, have been created to ensure that the chiplets act as a cohesive system.

As semiconductor processes advance to next-generation nodes, the economic advantages of chiplets are expected to become even more significant. Consequently, adopting a chiplet-based design – with its reduced NRE costs, improved yields and reduced need for the most advanced processes – can lead to significant cost savings compared to manufacturing a larger monolithic die.

Chiplets

Alphawave Semi’s leading IP and custom silicon expertise are integrated into a foundation for prebuilt connectivity chiplets. This cost effective and flexible approach delivers connectivity at a higher bandwidth and lower power than traditional infrastructure solutions. Advanced package technologies have made it possible to intelligently combine various chip functions by stacking die onto a single substrate. Using the N-1 (or even N-2) process for the I/O chiplet can increase savings and decrease time to market by breaking up the SoC into smaller, more efficient building blocks.

AlphaCHIP-I/O

Reconfigurable 112G SerDes I/O with integrated protocol controllers, security IP and UCIe PHY and Controller IP that enables up to 1.6T of throughput at MR, XLR, and PCIe/CXL reaches.

The 1.6T high-speed I/O chiplet delivers exceptional data throughput, enabling up to 1.6 Tbps per second of bandwidth. Built with multi-protocol 112G SerDes I/O and encompassing integrated protocol controllers and security IP, it caters to the most demanding data transfer requirements. This chiplet ensures that systems can handle massive data volumes with minimal latency, making it ideal for applications in artificial intelligence, machine learning, and big data analytics. Its high-speed capabilities support the rapid scaling of data center operations, contributing to improved performance, efficiency, and the ability to meet future technological advancements without the need for extensive hardware overhauls.

The combo PCIe/CXL/Ethernet chiplet offers remarkable versatility by combining PCI Express (PCIe), Compute Express Link (CXL), and Ethernet protocols into a single, integrated solution. This multi-protocol support allows for seamless communication between processors, accelerators, and networking components. By unifying these interfaces, the chiplet simplifies system architecture, and optimizes data pathways for higher throughput. This is particularly beneficial in high-performance computing and data center environments where workload demands are dynamic, and efficient resource sharing is essential for maximizing computational efficiency.

Leveraging advanced 112G SerDes technology and integrated protocol controllers, it supports data transmission rates up to 1.6Tbps, maintaining high signal integrity over long reaches.

The chiplet provides an efficient solution for transmitting high-speed data over medium distances using optical fibers. By integrating a reconfigurable 112G SerDes I/O with an optimized optical driver optimizations and capability, it enables seamless and reliable communication between devices within data centers or campus networks. This chiplet enhances performance by reducing latency and increasing bandwidth, making it ideal for applications that require swift data exchange without the complexity of long-haul communication systems. Its design simplifies the integration of optical interfaces, leading to cost savings and improved energy efficiency in reaches where copper links fall short in networking scenarios.

Arm Compute Chiplet

High-performance Arm® Neoverse™ Compute Cluster – high-performance compute chiplet for artificial intelligence/machine learning (AI/ML), high-performance compute (HPC), data center and 5G/6G networking infrastructure applications

  • Arm Neoverse class compute cluster
  • High-speed PCIe links
  • UCIe die-to-die interconnect
  • High-performance memory

The Arm Neoverse class compute cluster offers cutting-edge performance for a wide range of computing applications. Built on the advanced Arm Neoverse architecture, it provides high computational density and efficiency, making it ideal for artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), data centers, and 5G/6G networking infrastructure. This compute cluster enhances processing capabilities, enabling faster data analysis and improved workflow efficiency in complex computing environments.

High-speed PCIe links

Incorporating high-speed PCIe links significantly enhances the data transfer capabilities of the compute chiplet. PCIe is a high-bandwidth, low-latency interface standard that enables fast communication between the compute cluster and peripheral devices such as xPUs, storage solutions, and networking components. The integration of the latest PCIe standards ensures that the chiplet can handle increased data loads and high-speed signal transmission, which is essential for applications that require rapid data exchange and real-time processing. This leads to reduced bottlenecks within the system architecture, allowing for smoother performance and the ability to scale resources as needed. For AI/ML applications and HPC workloads, high-speed PCIe links facilitate quick access to data and resources, thereby accelerating computational tasks and improving overall efficiency.

UCIe die-to-die interconnect

The development of the UCIe (Universal Chiplet Interconnect Express) die-to-die interconnect provides a standardized, high-bandwidth communication protocol between chiplets within a single package. This specification enables seamless integration of various chiplets, such as compute, memory, and specialized accelerators, allowing them to function cohesively as a unified system. The UCIe interconnect offers low-latency and high-throughput communication channels, which are critical for maintaining optimal performance in complex computing environments. By utilizing a universal interconnect standard, the UCIe consortium promotes interoperability and flexibility in system design, enabling developers to mix and match chiplets from different vendors without compatibility issues. This fosters innovation and reduces time to market by simplifying the integration process and enabling scalable system architectures tailored to specific application needs.

High-performance memory

High-performance memory is a crucial component that complements the processing capabilities of the compute cluster. It provides fast and efficient data storage and retrieval, which is essential for maintaining high levels of performance in data-intensive applications. Advanced memory technologies integrated into the chiplet, such as high-bandwidth memory (HBM) or DDR, offer increased memory capacity and speed, reducing latency and allowing for quicker access to large datasets. This is particularly beneficial in AI and ML applications, where rapid processing of vast amounts of data is necessary for training models and inference operations. In HPC and data center environments, high-performance memory ensures that processors are not idle waiting for data, thereby optimizing resource utilization and improving overall system throughput. Enhanced memory performance contributes to faster computational results, greater efficiency, and the ability to handle more complex tasks with ease.