And in reality, most folks using ML are using a variety of different workloads. But you may not be able to share that workload, and it may be specific only to you. The reality is the best benchmark is always the workload you run. “We want our benchmarks to be used to compare solutions to help people buy as well as make design choices, figure out if a given technique is actually worth pursuing. See the slides below for a visual summary. There were no changes to the suite of tests in MLPerf Inference 3.0 but a new scenario – networking – was added. Inferencing, while generally not as computationally intensive as training, is a critical element in AI delivery. The submitters include Alibaba, ASUSTeK, Azure, cTuning, Deci.ai, Dell, Gigabyte, H3C, HPE, Inspur, Intel, Krai, Lenovo, Moffett, Nettrix, NEUCHIPS, Neural Magic, Nvidia, Qualcomm Technologies, Inc., Quanta Cloud Technology, Rebellions, SiMa, Supermicro, VMware, and xFusion, with nearly half of the submitters also measuring power efficiency. (More coverage of the LLM discussion is deeper in the article)Īll in all, the latest MLPerf showing was impressive, with roughly 6,700 inference performance results and 2,400 power efficiency measurements reported. Now, suddenly, there’s a massive pile of money on the table, we’re talking Google and Microsoft, and AWS and everybody else whose business is going to be impacted positively or negatively.’ I think MLCommons can play a strong role here in helping the industry understand which hardware can deliver what economics in providing inference processing for very large language models.” He said, ‘you know AI has been interesting from a revenue standpoint, it’s primarily been impactful to the hardware vendors. Talking with Andrew Feldman (founder/CEO Cerebras) a couple of weeks ago, and he said, something that struck in my mind. Karl Freund, founder and principal analyst, Cambrian AI Research, added, “Just to elaborate a little.I think anyone who’s running BERT Large as a training model, inference model, at least as a proxy for running these smaller GPT models as well.” I think the other way to think about it is that these large models like GPT3 and GPT4 are going to float all boats in that they’re going to generate hundreds if not thousands of smaller models that are distilled down from these very large models. Intel’s Jordan Plawner, senior director, AI products, echoing others, said, “In our early results of testing these much larger models, much larger than large BERT.Interestingly, there was some consensus that BERT can serve as an early proxy of larger LLMs even though it’s much smaller in scale (GPT3 has 175 billion parameters, BERT large has on the order of 300 million). The expected proliferation of generative AI applications – think targeted versions of ChatGPT and DALL♾ 2 – will likely produce a demand spike for inferencing infrastructure. How should MLPerf venture into the generative AI waters? Is BERT Large a good proxy for LLMs? MLCommons executive director David Kanter said a large language model (LLM) will be added to the MLPerf benchmarking suite soon.Ĭurrently, BERT (bidirectional encoder representations from transformers) is the NLP model used by MLPerf. Newcomer participants included CTuning, Quanta Cloud Technology, SiMa and xFusion.Īlso noteworthy was the discussion around generative AI – yes, more chatter about ChatGPT writ large – during a press/analyst pre-briefing this week. Intel showcased early Sapphire Rapids-based systems, and Qualcomm’s Cloud AI 100 was a strong performer, particularly in power metrics. There were 25 submitting organizations, up from 21 last fall and 19 last spring. While Nvidia continues to dominate the results – topping all performance categories – other companies are joining the MLPerf constellation with impressive performances. MLCommons today released the latest MLPerf Inferencing (v3.0) results for the datacenter and edge. Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
0 Comments
Leave a Reply. |