Almost all internet traffic is currently transmitted through data centers, coupled with the popularity of generative AI applications such as ChatGPT, which has created an unprecedented demand for computing power. Data centers around the world are deploying high-performance GPUs and CPUs as much as possible.
This also puts forward higher requirements for electricity and energy. It is understood that the electricity consumed by a single interaction with an AI large model may be equivalent to the power consumption of a low-brightness LED bulb running for 1 hour.
The energy consumption of AI includes not only the electricity used to power the servers but also the additional energy required to cool the data centers. On average, to cool the system, it may consume an additional 50% of the normal power of these devices.
With the development of AI and high-performance computing, the configuration of chips, servers, and racks in data centers is becoming more and more dense. This high density requires more powerful cooling systems to ensure that the equipment can operate within a safe temperature range to maintain the performance and reliability of the system.
Advertisement
It is understood that the cooling cost of data centers has become the fastest-growing part of their physical infrastructure costs, with a compound annual growth rate of 16%. When maintaining high-performance operation, the growth rate of cooling costs exceeds the existing capabilities. According to data from the Lincoln Laboratory of the Massachusetts Institute of Technology, by 2030, data centers will consume up to 21% of the world's electricity supply.To address the energy consumption issues of AI, the industry has not only developed specialized AI custom chips to improve energy efficiency but has also adopted more efficient cooling technologies to help data centers achieve maximum sustainability.
Recently, a company called ZutaCore has demonstrated the industry's first dielectric direct chip liquid cooling cold plate for NVIDIA GPUs. This is a waterless, direct-to-chip, two-phase liquid cooling system designed specifically for AI and high-performance computing workloads. The company has collaborated with numerous suppliers such as Intel, Dell, and Rittal, and several server manufacturers are currently working with ZutaCore to complete certification and testing for the NVIDIA GPU platform.
Although traditional air-based cooling methods are gradually being phased out, liquid cooling technology offers new possibilities for data centers. However, water-based cooling solutions consume a large amount of water resources and also face challenges in improving energy efficiency and reducing environmental impact.
ZutaCore's "HyperCool" cooling solution does not rely on water as a cooling medium but uses a special dielectric liquid. This cooling method directly contacts the cooling liquid with the chips that need to be cooled, which can more effectively absorb and remove heat compared to traditional air cooling or indirect liquid cooling. HyperCool technology can also recover and reuse the heat generated by data centers, achieving 100% heat reuse.
The following figure shows how the HyperCool system works and how thermal energy is recovered and utilized in schools, offices, and homes.The HyperCool Dielectric Cold Plate is the core component of the system, directly mounted on the chips that require cooling. Utilizing a water-free dielectric liquid, this liquid possesses excellent heat dissipation properties and is non-conductive, featuring an extremely low Global Warming Potential (GWP) and Ozone Depletion Potential (ODP).
When the dielectric liquid absorbs the heat generated by the chips, it turns into thermal vapor. The HyperCool Heat Rejection Unit is responsible for expelling the absorbed heat from the thermal vapor. During this process, the dielectric liquid cools down and reverts to a liquid state, circulating back to the cold plate to continue absorbing heat.
The heat expelled from the heat rejection unit can be recovered through the facility's water system. The recovered heat can be used to heat offices and homes or to provide thermal energy for school heating systems, achieving 100% sustainability.
This direct chip cooling solution is more efficient, using less than half the energy and space of traditional systems. The entire system is designed to effectively recover and utilize the waste heat from data centers, reducing energy waste and also mitigating the environmental impact.
By adopting this efficient cooling technology, data centers can significantly reduce operational costs, especially in terms of maintenance and energy consumption of the cooling system, thereby reducing the total cost of ownership by 50%.Traditional cooling technologies may experience a decline in performance due to increased temperatures or require thermal management that restricts performance. With HyperCool technology, which offers higher cooling efficiency, data centers can install more servers and processors, thereby supporting higher workloads without overheating. By effectively controlling temperatures, processors can operate at performance levels close to their design limits for extended periods, thus enhancing overall computational output.
Not only does this avoid the consumption of water resources and potential leakage risks, but the computational performance of data centers is also expected to increase by up to 10 times.
It is worth mentioning that the HyperCool system allows operators to upgrade with minimal changes to existing infrastructure, increasing processing capabilities while reducing energy and space usage. This is beneficial for cloud service providers and large enterprises that frequently need to rapidly expand their computing capabilities.
Additionally, the current power consumption of each Nvidia H100 GPU is as high as 700W, which poses a significant challenge for data centers that are already under pressure to control heat, energy consumption, and space. It is understood that HyperCool can reduce cooling energy consumption by 80%, support GPUs with over 1500W, and increase rack density by 300%.
In summary, cooling in data centers is a key aspect of ensuring hardware efficiency and extending the life of equipment. As the scale of data centers and computational demands increase, efficient cooling solutions are becoming increasingly important.Industry insiders are paying close attention to solutions that can meet the explosive growth of data centers while being more sustainable. In addition to the methods provided by HyperCool, there are also domestic companies like Lan Yang Technology that offer immersion liquid cooling technology. They can provide data centers, PC hosts, new energy vehicles, energy storage batteries, 5G base stations, projectors, and aerospace fields with highly efficient and energy-saving cooling terminal products and technical services.
By continuously providing optimized cooling, the hardware of data centers can operate at a higher performance level, avoiding performance fluctuations caused by temperature issues. This achieves computing capabilities far beyond traditional facilities, which is particularly crucial for applications that rely on high-performance computing, such as artificial intelligence and big data analysis.
The introduction of technologies similar to HyperCool may change the way data centers are designed and managed, promoting the entire industry to develop towards a more efficient and environmentally friendly direction.
Comment