Description
CRD,ACLTR,GPGPU,NVID,A100,80G
At the heart of NVIDIA’s A100 GPU is the NVIDIA Ampere architecture, which introduces double-precision tensor cores allowing for more than 2x the throughput of the V100 – a significant reduction in simulation run times. The double-precision FP64 performance is 9.7 TFLOPS, and with tensor cores this doubles to 19.5 TFLOPS. The single-precision FP32 performance is 19.5 TFLOPS and with the new Tensor Float (TF) precision this number significantly increases to 156 TFLOPS; ~20x higher than the previous generation V100. TF32 works as a hybrid of FP16 and FP32 math models that uses the same 10-bit precision mantissa as FP16, and 8-bit exponent of FP32, allowing for speedup increases on specific benchmarks.
Furthermore, the A100 supports a massive 40GB of high-bandwidth memory (HBM2) that enables up to 1.6TB/s memory bandwidth. This is 1.7x higher memory bandwidth over the previous generation V100.
A100 80GB PCIe | |||
---|---|---|---|
FP64 | 9.7 TFLOPS | ||
FP64 Tensor Core | 19.5 TFLOPS | ||
FP32 | 19.5 TFLOPS | ||
Tensor Float 32 (TF32) | 156 TFLOPS | 312 TFLOPS* | ||
BFLOAT16 Tensor Core | 312 TFLOPS | 624 TFLOPS* | ||
FP16 Tensor Core | 312 TFLOPS | 624 TFLOPS* | ||
INT8 Tensor Core | 624 TOPS | 1248 TOPS* | ||
GPU Memory | 80GB HBM2e | ||
GPU Memory Bandwidth | 2,039 GB/s | ||
Max Thermal Design Power (TDP) | 400W | ||
Multi-Instance GPU | Up to 7 MIGs @ 10GB | ||
Form Factor | SXM | ||
Interconnect | NVLink: 600 GB/s PCIe Gen4: 64 GB/s |
||
Server Options | NVIDIA HGX™ A100-Partner and NVIDIA-Certified Systems with 4,8, or 16 GPUs NVIDIA DGX™ A100 with 8 GPUs |
New Features
In addition to the product specification improvements noted above, the NVIDIA A100 introduces 3 key new features that will further accelerate High-Performance Computing (HPC), Training and Artificial Intelligence (AI) Inference workloads:
- 3rd Generation NVIDIA NVLink™ – The new generation of NVLink™ has 2x GPU-to-GPU throughput over the previous generation V100.
- Multi-Instance GPU (MIG) – This feature enables a single A100 GPU to be partitioned into as many as seven separate GPUs, which benefits cloud users looking to utilize their GPUs for AI inference and data analytics workloads
- Structural Sparsity – This feature supports sparse matrix operations in tensor cores and increases the throughput of tensor core operations by 2x (see Figure 6)
Return & Refund
Payment & Security
Your payment information is processed securely. We do not store credit card details nor have access to your credit card information.