Google said it has built the worlds fastest machine learning (ML) training supercomputer that broke AI performance records in six out of eight industry-leading MLPerf benchmarks.
Using this supercomputer, as well as the latest Tensor Processing Unit (TPU) chip, Google has set new performance records.
“We achieved these results with ML model implementations in TensorFlow, JAX and Lingvo. Four of the eight models were trained from scratch in under 30 seconds,” Naveen Kumar from Google AI said in a statement.
To put that in perspective, it took more than three weeks to train one of these models on the most advanced hardware accelerator available in 2015.
Google’s latest TPU supercomputer can train the same model almost five orders of magnitude faster just five years later.
MLPerf models are chosen to be representative of cutting-edge machine learning workloads that are common throughout industry and academia.
The supercomputer Google used for the MLPerf training round is four times larger than the “Cloud TPU v3 Pod” that set three records in the previous competition.
Graphics giant Nvidia said it also delivered the world’s fastest Artificial Intelligence (AI) training performance among commercially available chips, a feat that will help big enterprises tackle the most complex challenges in AI, data science and scientific computing.
Nvidia A100 GPUs and DGX SuperPOD systems were declared the world’s fastest commercially available products for AI training, according to MLPerf benchmarks.
The A100 Tensor Core GPU demonstrated the fastest performance per accelerator on all eight MLPerf benchmarks.
“The real winners are customers applying this performance today to transform their businesses faster and more cost effectively with AI,” the company said in a statement.
The A100, the first processor based on the Nvidia Ampere architecture, hit the market faster than any previous Nvidia GPU.
World’s leading cloud providers are helping meet the strong demand for Nvidia A100, such as Amazon Web Services (AWS), Baidu Cloud, Microsoft Azure and Tencent Cloud, as well as dozens of major server makers, including Dell Technologies, Hewlett Packard Enterprise, Inspur and Supermicro.
“Users across the globe are applying the A100 to tackle the most complex challenges in AI, data science and scientific computing,” said the company.
(IANS)
Good post but I was wondering if you could write a litte more on this subject? I’d be very thankful if you could elaborate a little bit further. Appreciate it!