Nvidia’s RTX 4090 Launch: A Strong Ray-Tracing Focus
Recently, Nvidia announced their RTX 4000 series at GTC 2022. At a high level, Nvidia took advantage of a process node shrink to evolve their architecture while also scaling it up and clocking it much...
View ArticleChina’s Phytium D2000: Building on A72?
China has been investing heavily into homegrown microprocessors for the past few years. Phytium’s D2000 is one example of a homegrown design, and we’ll be taking a close look at it in this article....
View ArticleHot Chips 34 – Biren’s BR100: A Machine Learning GPU from China
Previously, we covered Tesla’s Dojo, an unconventional CPU architecture specifically designed for neural network training. We also looked at AMD’s CDNA2, a GPU architecture focused on high performance...
View ArticleSkylake: Intel’s Longest Serving Architecture
Intel debuted Skylake in 2015. Then Skylake variants filled out major parts of Intel’s lineup for the next six years. Skylake faced no serious competition at launch, but wound up holding the line...
View ArticleMicrobenchmarking Intel’s Arc A770
Intel’s Arc GPUs represent the company’s third attempt at taking on the dedicated GPU market. The company’s first attempt, the i740, tried to use the then-new AGP interface to store textures in system...
View ArticleAddendum: Clock Ramp on ADL, Zen 4, M1, and More
We recently released an article on how quickly CPUs increased their clocks from idle, and received criticism in that it didn’t include CPUs newer than Kaby Lake. That was because I just didn’t have...
View ArticleWhy you can’t trust CPUID
While AMD is most certainly planning more processors based on the Zen 4 architecture, unfortunately many of the “leaked” Geekbench 5 results that made the rounds on social media yesterday are faked....
View ArticleMicrobenchmarking Nvidia’s RTX 4090
Nvidia’s RTX 4090 features Nvidia’s newest architecture, named Ada Lovelace after a pioneer in early computing. Compared to their previous architecture, Ampere, Ada Lovelace enjoys a process node...
View ArticleAMD’s Zen 4 Part 1: Frontend and Execution Engine
AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release. In February 2021 we...
View ArticleAMD’s Zen 4, Part 2: Memory Subsystem and Conclusion
Please go through part 1 of our Zen 4 coverage, if you haven’t done so already. This article picks up where the previous one left off. To summarize, Zen 4 has made several moves in the frontend and...
View ArticleCannon Lake: Intel’s Forgotten Generation
Palm Cove (also commonly known as Cannon Lake) is Intel’s 10 nm die shrink of the Skylake core. In Intel’s “Tick Tock” strategy of alternating major microarchitecture changes and ports to new process...
View ArticleKnight’s Landing: Atom with AVX-512
Intel is known for their high performance cores, which combine large out of order execution engines with high clock speeds to maximize single threaded performance. But Intel also has the “Atom” line...
View ArticleWas Rocket Lake Power Efficient?
Compared to Golden Cove? Of course not. A newer architecture on a newer process should have a pretty huge advantage. Unsurprisingly, Golden Cove is faster at any power level. So let’s go for more...
View ArticleGolden Cove’s Lopsided Vector Register File
Ever since Intel first brought AVX-512 to their big core lineup, they made all of the vector registers 512-bit. They extended the vector register file on the opposite side of the execution units, and...
View ArticleAMD’s Zen 4, Part 3: System Level Stuff, and iGPU
We covered Zen 4’s core architecture in depth in two articles. This one will focus on anything we didn’t manage to get to. Some of these details may be specific to the particular CPU sample we have,...
View ArticleMicrobenchmarking AMD’s RDNA 3 Graphics Architecture
Editor’s Note (6/14/2023): We have a new article that reevaluates the cache latency of Navi 31, so please refer to that article for some new latency data. RDNA 3 represents the third iteration of...
View ArticleGolden Cove’s Vector Register File: Checking with Official (SPR) Data
In late December 2022, we published an article going over how Intel optimized Golden Cove’s vector register file to handle AVX-512 while minimizing area overhead. A few days ago, Intel published data...
View ArticleBulldozer, AMD’s Crash Modernization: Frontend and Execution Engine
AMD’s K7 Athlon architecture formed the basis of the company’s CPU offerings for around a decade. Athlon did very well against Intel’s P6 based Pentium III. K8 got the basics right, introduced 64-bit...
View ArticleBulldozer, AMD’s Crash Modernization: Caching and Conclusion
In Part 1, we looked at Bulldozer’s core architecture. But the architecture itself isn’t the full story. Memory advances have not kept up with CPU speed, so modern CPUs cope with increasingly...
View ArticlePreviewing China’s Loongson 3A5000 with Performance Counters
Loongson’s 3A5000 represents another domestic CPU effort from China. It implements four LA464 cores, and targets everything from desktops to servers to embedded applications. Like the Zhaoxin...
View Article