Google’s Secret Weapon: Inside the Lab Where TPUs Power AI’s Future
Inside a sprawling lab at Google headquarters, hundreds of server racks buzz with activity, but these aren’t powering the world’s dominant search engine or executing workloads for Google Cloud. They’re running tests on Google’s own microchips called Tensor Processing Units (TPUs), the secret weapon behind the company’s ambitions in the rapidly evolving field of generative AI. While Nvidia dominates the market for training AI models, Google has forged its own path with TPUs, custom-built for AI and available to cloud customers since 2018. Apple, too, is relying on Google’s TPUs to train its AI models.
Key Takeaways:
- Google’s TPUs are custom-built for AI, offering superior efficiency compared to general-purpose chips.
- Google’s commitment to custom chips, particularly TPUs, is a key driver behind its cloud’s growth.
- Apple’s reliance on Google’s TPUs highlights the growing importance of custom chips in AI development.
- The race for AI dominance has led to a surge in chip innovation, with Google, Nvidia, and others investing heavily in this critical technology.
‘A Simple But Powerful Thought Experiment’
In 2014, Google started exploring the potential of custom hardware after a thought experiment. The question: what if all Google users interacted with the company via voice for just 30 seconds a day? The answer: a dramatic increase in computing power, requiring Google to double the number of computers in its data centers.
To tackle this challenge, Google developed TPUs, proving to be 100 times more efficient than general-purpose hardware for AI tasks. While Google’s data centers still rely on CPUs and GPUs, TPUs have emerged as a vital component of its AI strategy.
TPUs are a type of ASIC (application-specific integrated circuit) specifically designed for AI, while other custom chips, like Google’s Video Coding Unit, are focused on video processing. Google also makes custom chips for its devices, mirroring Apple’s custom silicon strategy. The Tensor G4 powers Google’s latest Pixel 9, while its A1 chip powers the Pixel Buds Pro 2.
Broadcom and TSMC: Partners in Innovation
Developing custom chips is an expensive and complex undertaking, requiring expertise and resources Google doesn’t possess alone. This led to collaborations with partners like Broadcom and TSMC.
Broadcom plays a crucial role in designing AI chips, contributing the peripheral I/O (input/output) components, SerDes (serializer/deserializer) technology, and the packaging, while Google focuses on the core compute capabilities. This partnership demonstrates the collaborative nature of AI chip development, where companies specialize in different areas to achieve a common goal.
The final step involves TSMC, the world’s largest chipmaker, which manufactures the chips based on Google’s design. This reliance on TSMC raises concerns about vulnerabilities if the geopolitical situation between China and Taiwan deteriorates. However, Google is "hopeful" that these concerns won’t materialize.
Processors and Power: A New Era Begins
Google’s commitment to custom chips has expanded beyond TPUs, with the launch of its Axion CPU, a development the company was "late" to, but strategically timed to capitalize on the growing demand for high-performance processing in AI applications.
Axion’s arrival marks a significant shift in Google’s AI strategy, as it now encompasses custom chips for various needs:
- TPUs: For AI training and inference.
- Video Coding Units: For video processing.
- Axion CPU: For general-purpose computing, powering services like BigQuery, Spanner, and YouTube advertising.
- Custom chips for devices: Like the Tensor G4 and A1.
All these processors rely on Arm chip architecture, a power-efficient alternative to the traditional x86 architecture from Intel and AMD. This shift toward Arm is critical, considering the projected surge in power consumption by AI servers, estimated to be equivalent to a country like Argentina by 2027. Google remains committed to reducing its carbon footprint, having implemented direct-to-chip cooling in its third-generation TPU to decrease water consumption for cooling servers.
The Future of AI: Hardware is King
Despite the complexities involved in designing, manufacturing, and powering custom chips, Google remains committed to its generative AI development and the crucial role hardware plays. This dedication is reflected in the ongoing development of Google’s sixth-generation TPU, Trillium, scheduled for release later this year.
The competition in the generative AI space has created a "rat race" for chip innovation, with companies like Google, Nvidia, Amazon, Microsoft, Alibaba, and others making significant investments in custom chips to gain an edge. Google’s strategy to develop its own chips represents a strategic shift in its AI journey, positioning the company as a major player in the race for AI dominance.