Let’s talk about AI Hardware — the horsepower behind AI advancements

Softwares are limited by hardware development

Most of the latest innovations are backed by software and software is ultimately backed by hardware. AI is no different. Let’s talk about how is AI hardware different from general-purpose machines. Let’s take one step back and check how the advancement in hardware can accelerate the impact of AI on the globe.

From the first impression, we think that this hardware is costly, then why are we going that way? It is like developing and investing in new ways of consuming fossil fuel which we know for sure that is going to be exhausted very soon. The simple answer is, we are using the hardware horsepower to train the system and then utilizing transfer learning, we plan to use that trained algorithm on millions of commercial systems (low in hardware). This economies of scale can help us distributing the impact and in turn, minimize the negative impact of high power consumption on the environment and globe. Moreover, we can improve on the hardware front to optimize it for less power consumption and higher speed leaving extra buffer time and power for more analysis and trials.

An improvement of just 100% is already too much. However, the new machine that Google has come up with enhances it by 250% already. Check out the comparison graph below published by Google.

Improvements from Google’s 3rd to 4th gen tensor processing units (TPUs).
Improvements from Google’s 3rd to 4th gen tensor processing units (TPUs).

The race of developing hardware for AI started long back in the 90s. Companies wanted more powerful hardware so they utilized the NVidia’s GPU which is known to do huge calculations, especially floating-point calculations in very less time. But it is costly and power-consuming and that is why it drains battery and heats up the system when we play high graphic video games. However, for commercial purposes, this is really good.

But, it seems this was not enough. Hardware companies thought, what we can do if we cannot create competing software? They played at their strength and thought to build better and specialized hardware for AI. Obviously, it will consume more power, but anyways, big techs are not caring for power consumption as of now (please see the writing where I have discussed it in detail). The big tech players on the list are Alphabet, Apple, NVidia, Samsung, Intel, Qualcomm, Baidu, and AMD.

So, what is the difference between AI hardware and the normal one?

When we say AI hardware, we mean AI accelerators. These are the type of microprocessors which are mainly designed for a simple but humongous number of calculations. Precision and complexity of calculations are not required but the volume of simple arithmetic calculation is required and which is very common to AI applications. The GPUs currently used in AI applications do the same, but in large chunks and that is why they have to go multicore physically or at least virtually. However, when we talk about commercial scale hardware, it becomes very costly to train networks like GPT-3 that I talked about recently. As the world is moving towards more and more data training, there is an unquenchable thirst for hardware that demands more computational power yet cost-efficient. It demands the cloud and edge computing support with faster insights. This in turn drives the innovation to look for new materials and new architectures in the domain.

AI Hardware Architectures

The architectures can be divided into two main types: Von Neuman and on-chip. The former one achieves parallel design mainly by coding or rather kernels to perform matrix operations which are the heart of ML algorithms. However, the on-chip architecture uses on-chip or rather hardware to perform the operations using virtual weights and digital activations used for neural calculations. There is an analog approach as well that has come up with. It has a higher risk but a significant promise if it works out. Below is the classification that I am talking about.

comparision of two types of GPU architecture
comparision of two types of GPU architecture

Image credit:

Now let me take the example of one of the hardware and talk about the technical specs that it holds. The latest advancements in the AI hardware realm is Graphcore’s Colossus Mk2.

Technical Specs

UK based startup Graphcore has presented a challenging claim of developing the world’s fastest AI chip. It has 60 billion transistors (as compared to NVidia’s latest having 54 billion and Cerebras flagship having 1.2 trillion). Colossus Mk2 has 1500 independent cores that can run 9000 parallel threads. It claims to have the power of 10¹⁵ floating-point operations per second and at the peak it can reach 16 exaflops of compute where the competing systems are yet to reach 1 exaflop. Developers can directly code in python and C++ on the platform called IPU-Machine M200, integrated with the new chip. It is planned to start shipping in the fourth quarter of 2020.

Even after these hefty claims and big innovations, experts comment that the fact that software evolves faster than hardware will once again stall the progress in AI as it did around the 80s. Microsoft’s supercomputer for OpenAI, OpenAI’s GPT-3, and the latest and most innovative google AI which can now prove new and previously unknown theorems are already proving to be big dents in science and technology advancements and humanity overall. I think it will stop but not for a decade at least!

Product Enthusiast — Utilizing the power of AI and Design to rethink possibilities and reframe the problem statement! Website:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store