Skip to content

The Aurora Supercomputer Is Installed_ 2 ExaFLOPS, Tens of Hundreds of CPUs and GPUs

Argonne Nationwide Laboratory and Intel stated on Thursday that that they had put in all 10,624 blades for the Aurora supercomputer, a machine introduced again in 2015 with a very bumpy historical past. The system guarantees to ship a peak theoretical compute efficiency over 2 FP64 ExaFLOPS utilizing its array of tens of 1000’s of Xeon Max ‘Sapphire Rapids’ CPUs with on-package HBM2E reminiscence in addition to Knowledge Heart GPU Max ‘Ponte Vecchio’ compute GPUs. The system will come on-line later this 12 months.

“Aurora is the primary deployment of Intel’s Max Sequence GPU, the largest Xeon Max CPU-based system, and the biggest GPU cluster on the earth,” stated Jeff McVeigh, Intel company vp and basic supervisor of the Tremendous Compute Group.

The Aurora supercomputer seems to be fairly spectacular, even by the numbers. The machine is powered by 21,248 general-purpose processors with over 1.1 million cores for workloads that require conventional CPU horsepower and 63,744 compute GPUs that can serve AI and HPC workloads. On the reminiscence aspect of issues, Aurora has 1.36 PB of on-package HBM2E reminiscence and 19.9 PB of DDR5 reminiscence that’s utilized by the CPUs in addition to 8.16 PB of HBM2E carried by the Ponte Vecchi compute GPUs.

The Aurora machine makes use of 166 racks that home 64 blades every. It spans eight rows and occupies an area equal to 2 basketball courts. In the meantime, that doesn’t depend the storage subsystem of Aurora, which employs 1,024 all-flash storage nodes providing 220PB of storage capability and a complete bandwidth of 31 TB/s. For now, Argonne Nationwide Laboratory doesn’t publish official energy consumption numbers for Aurora or its storage subsystem.

The supercomputer, which might be used for all kinds of workloads from nuclear fusion simulations as to whether prediction and from aerodynamics to medical analysis, makes use of HPE’s Shasta supercomputer structure with Slingshot interconnects. In the meantime, earlier than the system passes ANL’s acceptance assessments, it is going to be used for large-scale scientific generative AI fashions.

“Whereas we work towards acceptance testing, we’re going to be utilizing Aurora to coach some large-scale open-source generative AI fashions for science,” stated Rick Stevens, Argonne Nationwide Laboratory affiliate laboratory director. “Aurora, with over 60,000 Intel Max GPUs, a really quick I/O system, and an all-solid-state mass storage system, is the proper atmosphere to coach these fashions.”

Although Aurora blades have been put in, the supercomputer nonetheless has to bear and cross a collection of acceptance assessments, a typical process for supercomputers. As soon as it efficiently clears these and comes on-line later within the 12 months, it’s projected to achieve a theoretical efficiency exceeding 2 ExaFLOPS (two billion billion floating level operations per second). With huge efficiency, it’s anticipated to safe the highest place within the Top500 listing.

The set up of the Aurora supercomputer marks a number of milestones: it’s the business’s first supercomputer with efficiency increased than 2 ExaFLOPS and the primary Intel’-based ExaFLOPS-class machine. Lastly, it marks the conclusion of the Aurora saga that started eight years in the past because the supercomputer’s journey has seen its justifiable share of bumps.

Initially unveiled in 2015, Aurora was initially meant to be powered by Intel’s Xeon Phi co-processors and was projected to ship roughly 180 PetaFLOPS in 2018. Nevertheless, Intel determined to desert the Xeon Phi in favor of compute GPUs, leading to the necessity to renegotiate the settlement with Argonne Nationwide Laboratory to offer an ExaFLOPS system by 2021.

The supply of the system was additional delayed attributable to problems with compute tile of Ponte Vecchio as a result of delay of Intel’s 7 nm (now often called Intel 4) manufacturing node and the need to revamp the tile for TSMC’s N5 (5 nm-class) course of expertise. Intel lastly launched its Knowledge Heart GPU Max merchandise late final 12 months and has now shipped over 60,000 of those compute GPUs to ANL.