Tegra 3 - Design Perspective
Category : Reviews
Published by Marc Büchel on 13.12.11
Theses days a lot is in change at NVIDIA. The manufacturer that once was market leader for discrete PC and Workstation graphics cards gets growing competition from Intel and AMD with the integrated graphics units. As Moores Law progresses integrated graphics will soon compete with discrete mid range desktop products and NVIDIA needs to be ready for this moment.

NVIDIA is aware of this situation and they decided to invest in developing processors for the tablet market. With Tegra 2 they were able to show a first product that was accepted by manufacturers like ASUS and they put it into their quite successful Transformer tablet PC. For the Tegra 2 NVIDIA licences ARM cores builds their own SoCs (System on a Chip). Mostly they have been put into tablets but not into smartphones.



NVIDIA manufactures their Tegra 2 processors at TSMC. Therefore their using their 40 nanometer triple gate oxide process (LPG). The dual core processors that are based on ARMs Cortex-A9-Design are optimized for performance, which is the reason why clock frequencies up to one Gigahertz could be realized. But compared to competitors Tegra 2 had to fight with the disadvantage of high leakage currents. These were the consequence of the LPG-process. TSMC also offeres an LP process which in the end has leakage currents as a result which are orders of magnitudes lower than with the LPG process. The market leaders like Qualcomm, TI and Samsung are using this process at the moment. The high leakage has also been the reason why there were no smartphones based on NVIDIAs Tegra 2. The problem is, that when the phone is locked the background processes needed to much power to operate and this would drain the battery way too quickly.


Page 1 - Introduction
Page 2 - Less Leakage Power
Page 3 - Cache Hierarchie and Clock Speeds
Page 4 - Is it the right way to go?


Discuss this article in the forums. [pagebreak]

Less Leakage Power

At this point NVIDIA had several opportunities. One was for example that they could stick with the Tegra 2 design and optimize it for lower leakage power. Another more radical approach would have been to use TSMCs LP manufacturing process. But this would have thrown back the company from a performance perspective. Instead they headed for a much more creative way. What they did is to create a five core SoC whereas the fifth core is called the companion core. This companinon core is made using the TSMCs LP process which means that there will be very low leakage powers. This processor takes over when a Tegra 3 device is locked for example, so all the background processes will run with on a core that uses way less power. Furhtermore NVIDIA integrated power gating. This means that the core logic can be deactivated. Core which are not needed will therefore be shut down and drain no power at all. Like this NVIDIA elegantly solved the problem with the leakage power and as a consequence there could now even be smartphones based on NVIDIAs Tegra 3 SoC.

A look at other parts of the SoC reveals that Tegra 3 also went through some evolution processes. The new SoC for example features NEON-Support which is being realized via a ARM MPE (Media Processing Engine). To keep the Tegra 2 die as compact as possible NVIDIA decided to not support NEON with Tegra 2. Actually NEON is an instruction set which allows 2D as well as 3D acceleration. Furthermore it can also accelerate sound synthesis.



If we also take a closer look at the GPU we don't see a lot of differences. There is also more evolution than revolution. Whereas Tegra 2 had vier pixel and four vertex shaders, Tegra 3 now has twice as many shader units but still the same amount of vertex processors. The core count went up to twelve.


Page 1 - Introduction
Page 2 - Less Leakage Power
Page 3 - Cache Hierarchie and Clock Speeds
Page 4 - Is it the right way to go?


Discuss this article in the forums. [pagebreak]

Cache Hierachy and Clock Speeds

For the cache hierarchy we can see that NVIDIA didn't improve the L1 as well as the L2 cache. Every core gets 32KB/32KB L1 cache and all four cores share a 1 Megabyte L2 Cache. Using twice as many cores compared to the previous Tegra 2 but not increasing the L2 cache size means that NVIDIA obviously doesn't believe that there well be many applications out there making use of four cores. But nevertheless, regarding the L2 cache is now faster by two cycles on Tegra 3. For the L1 cache there is no such improvement.



Concerning the specifications there are also the clock frequencies. When only one performance core is in use then this one tops out at 1.4 Gigahertz. With Tegra 2 the maximum clock speed was 1.0 Gigahertz. When all four performance cores are active the maximum frequency is 1.3 Gigahertz. Furthermore the power gating feature allows the deactivation of every single core. Therefore Tegra 3 only drains more power than Tegra 2 when all for cores are under heavy load. In all the other scenarios it is more efficient than the predecessor. Furhtermore there is the companion core which clock at a maximum of 500 MHz. As already mentioned this one is being manufactured using TSMCs LP process. Therefore it is optimized for low power consumption.


Page 1 - Introduction
Page 2 - Less Leakage Power
Page 3 - Cache Hierarchie and Clock Speeds
Page 4 - Is it the right way to go?


Discuss this article in the forums. [pagebreak]

Is it the right way to go?

NVIDIAs Tegra 3 is a clever and creative combination of the advantages of TSMCs 40LP and 40LPG manufacturing processes. 40LP, which is used for the companion core offers very good energy efficency. On the other hand there is the 40LPG process, which NVIDIA uses for the four other cores. These therefore offer a lot of performance. In the end Tegra 3 becomes a highgly competitive product, being both energy efficent and powerful.

It was a strategically good descision of NVIDIA to invest in the development of SoCs. Especially if you think about the success of the Apple iPadand the generally booming market for Tablet PCs it makes a lot of sense having a competitive product in this market. Furthremore two of NVIDIAs competitors, namely AMD and Intel, don't even have an SoCs ready. Intels SoC version of Atom turned out to be a flop as it wasn't competitive from an energy efficiency point of view. With AMD the situation is even worse. They haven even touch the SoC market until today. At least their new CEO Rory Read is now pushing towards this direction but it will take at least two to three year from now until AMD can offer a solid product. The big boy for SoCs are Qualcomm, Texas Instruments and Samsung at the moment. These days NVIDIA has a comfortable advantage over Qualcomm and Texas Instruments, because their first quad core SoC will make it to market earliest in three to six months. Nevertheless these to SoCs are expected to be manufactured at 28 nanometer. At least NVIDIA has a comfortable time window now and we're sure their preparing preparing the transition to a 28 nanometer manufacturing process already right now.


Page 1 - Introduction
Page 2 - Less Leakage Power
Page 3 - Cache Hierarchie and Clock Speeds
Page 4 - Is it the right way to go?


Discuss this article in the forums.