The China-specific version of Gaudi 3 needs to significantly reduce its AI performance in order to comply with export regulations.

中國(guó)特供版的Gaudi 3需要大幅降低AI性能,才能合規(guī)出口。

Recently, media reported that Intel is preparing to launch a "special version" of Gaudi 3 for the Chinese market, including two hardware forms: an OAM-compatible mezzanine card called HL-328 and a PCle accelerator card called HL-388. The report pointed out that Intel disclosed the above information in its Gaudi 3 white paper, in which HL-328 will be launched on June 24 and HL-388 will be launched on September 24.

近日,有媒體稱,英特爾準(zhǔn)備針對(duì)中國(guó)市場(chǎng)推出“特供版”Gaudi 3,包括名為HL-328的OAM相容夾層卡和名為HL-388的PCle加速卡兩種硬件形態(tài)。報(bào)道指出,英特爾在其Gaudi 3白皮書(shū)中披露了上述信息,其中HL-328將于6月24日推出,HL-388將于9月24日推出。

What is shocking is that based on parameter estimates such as the number of cores, operating frequency, TDP, etc., compared to the Gaudi 3 international version, the performance of China's "special version" HL-328 chip may be reduced by about 92%.

令人震驚的是,基于內(nèi)核數(shù)量、工作頻率、TDP等參數(shù)估算,相比Gaudi 3國(guó)際版,中國(guó)“特供版”HL-328芯片性能或降低約92%。

What's different about the China special edition?

中國(guó)特供版有什么不同?

In terms of specific hardware specifications, compared with the original version, the China-specific version of Gaudi 3 has the same 96MB SRAM on-chip storage, 128GB HBM2e high-bandwidth storage, a bandwidth of 3.7TB/s, and a PCIe 5.0 x16 interface and decoding standard. However, due to the export control rules of AI chips in the United States, the comprehensive computing performance (TPP) of this type of high-performance AI needs to be lower than 4800 before it can be exported to China. This means that the 16-bit performance of the China-specific version of Gaudi 3 cannot exceed 150. TFLOPS.

具體硬件規(guī)格方面,中國(guó)特供版的Gaudi 3與原版相比,具有相同的96MB SRAM片上存儲(chǔ),128GB HBM2e高帶寬存儲(chǔ),帶寬為3.7TB/s,擁有PCIe 5.0 x16介面和解碼標(biāo)準(zhǔn)。但是,由于美國(guó)對(duì)于AI芯片的出口管制規(guī)則限制,使得這類高性能AI的綜合運(yùn)算性能(TPP)需要低于4800才能出口到中國(guó), 這意味中國(guó)特供版的Gaudi 3的16bit性能不能超過(guò)150 TFLOPS。

According to information released by Intel, Gaudi 3 can reach 1835 TFLOPS on FP16/BF16, which is 40% faster in large model training and 50% more efficient in inference than NVIDIA H100.

根據(jù)英特爾公布的資料顯示,Gaudi 3在FP16/BF16上可以達(dá)到1835 TFLOPS,相比英偉達(dá)H100在大模型訓(xùn)練方面快40%、推理能效高50%。

Obviously, the China-specific version of Gaudi 3 needs to significantly reduce its AI performance before it can be exported in compliance with regulations. Therefore, the China-specific version of Gaudi 3 needs to significantly reduce the number of cores (the original version has 8 matrix math engines and 64 tensor cores) and operating frequency.

顯然,中國(guó)特供版的Gaudi 3需要大幅降低AI性能,才能合規(guī)出口。因此,中國(guó)特供版Gaudi 3需要大幅削減內(nèi)核數(shù)量(原版擁有8個(gè)矩陣數(shù)學(xué)引擎和64 個(gè)張量?jī)?nèi)核)和工作頻率。

In July last year, Intel released Gaudi 2 for the Chinese market. Compared with the international version of Gaudi 2, the accelerator cards launched for the Chinese market have little difference in performance, while the number of integrated Ethernet RDMA ports has been reduced from 24 to 21 to comply with US chip export control regulations.

去年7月,英特爾就發(fā)布了面向中國(guó)市場(chǎng)的Gaudi 2。相比國(guó)際版Gaudi 2,面向中國(guó)市場(chǎng)推出的加速卡在性能上差別不大,而集成以太網(wǎng)RDMA端口數(shù)量從24個(gè)端口減到21個(gè),以符合美國(guó)芯片出口管制規(guī)定。

How the United States hijacks computing power

美國(guó)是如何劫持計(jì)算能力

In the 1990s, the United States accounted for more than one-third of global chip production, a share that had dropped to about 12% in 2020. In order to maintain its leading position in the semiconductor field, since the United States issued the "CHIPS and Science Act" (hereinafter referred to as the "Chip Act") in August 2022, the United States has implemented comprehensive semiconductor export controls on China. From the chips themselves to the chip manufacturing equipment, restrictions are constantly escalating.

20世紀(jì)90年代,美國(guó)占全球芯片產(chǎn)量的三分之一以上,這一份額到2020年已降至12%左右。為了維護(hù)半導(dǎo)體領(lǐng)域的領(lǐng)先地位,自2022年8月美國(guó)發(fā)布《芯片和科學(xué)法案》(CHIPS and Science Act,下稱“《芯片法案》”)以來(lái),美國(guó)對(duì)中國(guó)實(shí)施了全面的半導(dǎo)體出口管制,從芯片本身到芯片制造設(shè)備,限制措施不斷升級(jí)。

The CHIP Act is the centerpiece of the Biden administration's industrial revitalization policy, which uses U.S. government funds to restore domestic production of technology components critical to national security and economic growth. The bill bans subsidized U.S. and allied partner companies from building or expanding advanced process chip factories in China and other concerned countries for ten years.

《芯片法案》是拜登政府復(fù)興產(chǎn)業(yè)政策的核心,其利用美國(guó)政府資金恢復(fù)對(duì)國(guó)家安全和經(jīng)濟(jì)增長(zhǎng)至關(guān)重要的技術(shù)部件的國(guó)內(nèi)生產(chǎn)。該法案禁止獲得補(bǔ)貼的美國(guó)及其盟友伙伴的企業(yè)十年內(nèi)在中國(guó)和其他關(guān)切的國(guó)家新建或擴(kuò)大先進(jìn)制程芯片廠。

In October 2022 and October 2023, the U.S. Department of Commerce's Bureau of Industry and Security (BIS) issued export controls on China's advanced semiconductors and computing equipment twice in an attempt to affect China's advanced manufacturing, and Nvidia, AMD, Intel Many of its GPU and AI chip products can no longer be exported to China, and even the high-end gaming graphics card RTX 4090 has been restricted.

2022年10月、2023年10月,美國(guó)商務(wù)部工業(yè)和安全局(BIS)連續(xù)兩次發(fā)布對(duì)中國(guó)的先進(jìn)半導(dǎo)體和計(jì)算設(shè)備的出口管制,企圖讓中國(guó)先進(jìn)制造受影響,并且英偉達(dá)、AMD、英特爾的多款GPU和 AI 芯片產(chǎn)品已不能再出口到中國(guó),就連高端游戲顯卡RTX 4090都受到了限制。

In December 2023, the U.S. Department of Commerce BIS announced the launch of an investigation into the semiconductor supply chain at mature process nodes, and it was explicitly targeting the Chinese chip semiconductor industry.

2023年12月,美國(guó)商務(wù)部BIS宣布啟動(dòng)對(duì)成熟制程節(jié)點(diǎn)的半導(dǎo)體供應(yīng)鏈展開(kāi)調(diào)查,更是明晃晃地針對(duì)中國(guó)芯片半導(dǎo)體產(chǎn)業(yè)。

In the early morning of March 30 this year, Hong Kong time, the Bureau of Industry and Security (BIS) under the U.S. Department of Commerce issued new regulations and measures to "implement additional export controls", revising the two new export restrictions formulated by BIS in October 2022 and 2023. Regulations comprehensively restrict the sales of Nvidia, AMD and more advanced AI chips and semiconductor equipment to China.

北京時(shí)間今年3月30日凌晨,美國(guó)商務(wù)部下屬的工業(yè)與安全局(BIS)發(fā)布“實(shí)施額外出口管制”的新規(guī)措施,修訂了BIS于2022、2023年10月制定的兩次出口限制新規(guī),全面限制英偉達(dá)、AMD以及更多更先進(jìn) AI 芯片和半導(dǎo)體設(shè)備向中國(guó)銷售。

In this new regulation, the big stick of sanctions is waved again. BIS has dexed and revised some restrictions on the sales of semiconductor products to China from the United States, Macau, China and other places, including that Macau, China and the D:5 country group will adopt a "presumptive denial policy", and AI semiconductor products exported by the United States to China will be subject to "Case-by-case review" policy rules, including comprehensive inspection of technical level, customer identity, compliance plan and other information.

此次新規(guī)中,制裁大棒再次揮舞。BIS刪除和修訂了部分關(guān)于美國(guó)、中國(guó)澳門(mén)等地對(duì)華銷售半導(dǎo)體產(chǎn)品的限制措施,包括中國(guó)澳門(mén)和D:5國(guó)家組將采取“推定拒絕政策”,并且美國(guó)對(duì)中國(guó)出口的 AI 半導(dǎo)體產(chǎn)品將采取“逐案審查”政策規(guī)則,包括技術(shù)級(jí)別、客戶身份、合規(guī)計(jì)劃等信息全面查驗(yàn)。

Where does Intel's courage come from?

英特爾的勇氣來(lái)自哪里?

Although it is not yet on the market, Intel's special version of Gaudi 3 is very likely to bring some potential problems. For example, reduced performance may affect the user experience and application effects of Chinese enterprises; at the same time, if the special version of the chip does not have a price advantage, its market competitiveness may be affected to a certain extent. Therefore, Intel needs to make reasonable trade-offs in product design and pricing.

雖然還未上市,但英特爾的特供版Gaudi 3極有可能帶來(lái)一些潛在的問(wèn)題。例如,性能降低可能會(huì)影響中國(guó)企業(yè)用戶體驗(yàn)和應(yīng)用效果;同時(shí),如果特供版芯片在價(jià)格上沒(méi)有優(yōu)勢(shì),那么其市場(chǎng)競(jìng)爭(zhēng)力可能會(huì)受到一定影響。因此,英特爾需要在產(chǎn)品設(shè)計(jì)和定價(jià)等方面做出合理的權(quán)衡。

Two months ago, Nvidia's "special edition" AI chip H20 terminal products for China were available for pre-order. Product forms include computing cards and servers equipped with 8 H20 computing cards. From a performance point of view, the performance of Nvidia H20 is about one-sixth that of H100, but the price has not been significantly reduced, so the price/performance ratio is not high.

兩個(gè)月前,英偉達(dá)對(duì)華“特供版”AI芯片H20的終端產(chǎn)品已可接受預(yù)訂。產(chǎn)品形態(tài)包括計(jì)算卡和搭載8張H20計(jì)算卡的服務(wù)器。從性能上來(lái)看,英偉達(dá)H20性能約為H100的六分之一,但價(jià)格并未顯著降低,因此性價(jià)比并不高。

At the beginning of this year, according to people familiar with the matter, large Chinese companies such as Alibaba and Tencent have been testing special chip samples from Nvidia since November last year. They have told Nvidia that the number of chips they order from Nvidia this year will be far less than the previously planned purchase of Nvidia's high-performance chips that have been banned.

今年年初,據(jù)知情人士透露,自去年11月以來(lái),阿里巴巴、騰訊等中國(guó)大型企業(yè)一直在測(cè)試英偉達(dá)的特供芯片樣本。他們已向英偉達(dá)表明,今年向英偉達(dá)訂購(gòu)的芯片數(shù)量將遠(yuǎn)遠(yuǎn)少于此前原計(jì)劃購(gòu)買(mǎi)的、已經(jīng)被禁的英偉達(dá)高性能芯片。

Even though it faces the risk of revenue decline, Intel is still doing well under "prudent budgeting". Nearly two years after the U.S. government's Chip Act was launched, veteran chip giant Intel announced in March that it had received up to $8.5 billion in government subsidies and up to $11 billion in special loan support. It is understood that the subsidy support Intel receives comes from the "Chip Act" introduced by the Biden administration in 2022. This bill strives to help chip companies build more chip factories in the United States and build the United States into a chip manufacturing power. Intel is currently said to be It is the biggest beneficiary in the context of "chip manufacturing returning to the United States."

即便面臨營(yíng)收下滑風(fēng)險(xiǎn),但是英特爾依舊在“精打細(xì)算”下過(guò)得不錯(cuò)。在美國(guó)政府《芯片法案》推出近2年后,老牌芯片巨頭英特爾3月份宣布獲得高達(dá)85億美元的政府補(bǔ)貼以及多達(dá)110億美元的特殊貸款支持。據(jù)了解,英特爾所獲得的補(bǔ)貼支持來(lái)自于2022年拜登政府所出臺(tái)的《芯片法案》,該法案力爭(zhēng)幫助芯片公司在美國(guó)建造更多的芯片工廠,將美國(guó)打造為芯片制造強(qiáng)國(guó),英特爾目前可謂是“芯片制造業(yè)回流美國(guó)”這一背景下的最大受益者。

From the perspective of the AI ??market, NVIDIA currently occupies an absolute advantage in the chip market, and it is not easy for Intel to use its products to gain share. Wells Fargo statistics show that Nvidia currently has 98% market share in the data center AI market, while AMD's market share is only 1.2%, and Intel's is less than 1%. Therefore, for Intel, following the US government is a wise move to protect itself.

從AI市場(chǎng)看,目前英偉達(dá)在芯片市場(chǎng)占據(jù)著絕對(duì)優(yōu)勢(shì),英特爾希望用產(chǎn)品撬走份額并不容易。富國(guó)銀行統(tǒng)計(jì)顯示,目前英偉達(dá)在數(shù)據(jù)中心AI市場(chǎng)擁有98%的市場(chǎng)份額,而AMD公司的市場(chǎng)份額僅有1.2%,英特爾則只有不到1%。因此對(duì)于英特爾來(lái)說(shuō),緊跟美國(guó)政府反而是明哲保身之舉。

Computing power is in short supply, China substitution is underway

計(jì)算能力短缺,中國(guó)正在進(jìn)行替代

Computing power is the productivity of the big data era. With the rapid development of the digital economy, especially the explosion of AI, the demand for computing power in the entire society is growing rapidly. According to the "China Artificial Intelligence Computing Power Development Assessment Report 2023-2024" jointly launched by IDC and Inspur Information, during the period 2022-2027, the compound annual growth rate of China's intelligent computing power is expected to reach 33.9%. The scale reaches 1117.4 EFLOPS.

算力是大數(shù)據(jù)時(shí)代的生產(chǎn)力,伴隨數(shù)字經(jīng)濟(jì)的高速發(fā)展,特別是AI的爆發(fā),整個(gè)社會(huì)對(duì)算力的需求呈現(xiàn)快速增長(zhǎng)態(tài)勢(shì)。據(jù)IDC和浪潮信息聯(lián)合推出的《2023-2024年中國(guó)人工智能計(jì)算力發(fā)展評(píng)估報(bào)告》顯示,2022-2027年期間,預(yù)計(jì)中國(guó)智能算力規(guī)模年復(fù)合增長(zhǎng)率達(dá)33.9%,到2027年智能算力規(guī)模達(dá)1117.4 EFLOPS。

At the same time, staff from the Southern Branch of the China Academy of Information and Communications Technology stated at CITE 2024 that China currently accounts for more than 30% of the world's intelligent computing power, mainly relying on the U.S. NVIDIA GPU chips, and the share of domestic independent computing power is only 5%. The usage rate of American AI frxworks such as TensorfiowPyTorch and Caffe exceeds 90%.

與此同時(shí),中國(guó)信息通信研究院南方分院的工作人員在CITE 2024上表示,目前中國(guó)智能算力全球占比超30%,主要依賴美國(guó)英偉達(dá)GPU芯片,國(guó)產(chǎn)自主算力份額僅為5%,國(guó)內(nèi)TensorfiowPyTorch、Caffe等美國(guó)AI框架使用率超過(guò)90%。

From an application perspective, domestic mainstream chip manufacturers such as Shengteng, Cambrian, and Tianshu Zhixin have completed the adaptation of mainstream large models. Industry analysts believe that although there is still a big gap compared with the advanced chips of Nvidia and AMD, domestic GPU chips such as the Ascend 910 series can basically support domestic large-model applications. Liu Qingfeng, chairman of iFlytek, said in 1024 Developers last year It was stated at the festival that Huawei's GPU capabilities have been comparable to NVIDIA A100, and it has launched the "Flying Star One" large-model computing platform based on the Ascend ecosystem. Previously, the Cambrian Jisiyuan (MLU) series of cloud smart accelerator cards and the "Zhixiang Multi-modal Large Model" self-developed by Zhixiang Future have also been adapted. It claims to have reached international standards in terms of product performance and image quality. mainstream product level.

從應(yīng)用上來(lái)看,目前中國(guó)國(guó)內(nèi)如昇騰、寒武紀(jì)、天數(shù)智芯等主流芯片廠商已完成對(duì)主流大模型的適配。業(yè)內(nèi)分析認(rèn)為,雖然相較于英偉達(dá)、AMD的先進(jìn)芯片還有很大差距,但昇騰910系列等國(guó)產(chǎn)GPU 芯片目前基本可以支撐國(guó)內(nèi)的大模型應(yīng)用,科大訊飛董事長(zhǎng)劉慶峰在去年1024 開(kāi)發(fā)者節(jié)上曾表示,華為的GPU能力已能對(duì)標(biāo)英偉達(dá)A100,并基于昇騰生態(tài)推出了“飛星一號(hào)”大模型算力平臺(tái)。而在此前,寒武紀(jì)思元系列云端智能加速卡與智象未來(lái)自研的“智象多模態(tài)大模型”也已完成適配,其聲稱在產(chǎn)品性能和圖像質(zhì)量方面均達(dá)到了國(guó)際主流產(chǎn)品的水平。

China's process of large-scale substitution of imported AI chips is accelerating. For Intel, the key is how to meet U.S. policy requirements while taking into account the needs of the Chinese market and maintain product competitiveness and large customer experience. On the other hand, this also provides valuable development opportunities for China's local AI chip manufacturers. These manufacturers need to pay close attention to market dynamics and technology development trends to cope with potential competitive pressures.

中國(guó)大規(guī)模替代進(jìn)口AI芯片的進(jìn)程正在加速。對(duì)于英特爾來(lái)說(shuō),關(guān)鍵在于如何在滿足美國(guó)政策要求的同時(shí),兼顧中國(guó)市場(chǎng)需求,保持產(chǎn)品的競(jìng)爭(zhēng)力和大客戶體驗(yàn)。另一方面,這也為中國(guó)本土的AI芯片廠商提供了發(fā)展的寶貴機(jī)遇,這些廠商需要密切關(guān)注市場(chǎng)動(dòng)態(tài)和技術(shù)發(fā)展趨勢(shì),以應(yīng)對(duì)潛在的競(jìng)爭(zhēng)壓力。