Tomek Janiuk
Tomek Janiuk

Reputation: 93

How to receive L1, L2 & L3 cache size using CPUID instruction in x86

I encountered a problem during preparing an assembler x86 project which subject is to write a program getting L1 data, L1 code, L2 and L3 cache size.

I tried to find something in Intel Documentation & in the Internet but I failed.

THE MAIN PROBLEM IS: In case of AMD processors it is just to set EAX register to 80000005h & 80000006h values and get desired data from ECX and EDX registers but in case of Intel I can obtain this information only for L2.

What should I do to get L1 & L3 cache size for Intel processors ?

Upvotes: 8

Views: 9126

Answers (3)

Brendan
Brendan

Reputation: 37214

For Intel CPUs:

  • for newer CPUs you should use "CPUID, eax=0x00000004" (with different values in ECX)

  • for older CPUs (that don't support the first option) you should use "CPUID, eax=0x00000002". This involves having a table to look up what the values mean. There are cases where the same value means different things for different CPUs and you need addition information (e.g. CPU family/model/stepping).

For VIA CPUs; use the same methods as you would for Intel (with different tables for anything that involves "family/model/stepping").

For AMD CPUs:

  • for newer CPUs you should use "CPUID, eax=0x8000001D" (with different values in ECX)

  • for older CPUs (that don't support the first option) you should use "CPUID, eax=0x80000006" (for L2 and L3 only), plus "CPUID, eax=0x80000005" (for L1 only).

For all other cases (very old Intel/VIA/AMD CPUs, CPUs from other manufacturers):

  • use CPU "vendor/family/model/stepping" (from "CPUID, eax=0x0000001") with a table (or maybe 1 table per vendor) so you can search for the right CPU in your table/s and get the information that way.

  • if CPUID is not supported there are ways to try to narrow down the possibilities and determine what the CPU is with reasonable accuracy; but mostly you should just give up.

In addition; for all CPUs you should trawl through the errata sheets to see if CPUID provides wrong information; and implement work-arounds to correct that wrong information.

Note that (depending on which range of CPUs you support and awesome you want your code to be) it can take several months of work just to extract reliable information about caches.

Upvotes: 1

Dahui
Dahui

Reputation: 146

You can get the CPU L1, L2 and L3 cache size with CPUID instruction. According to the Intel x86 Software Developer's Manual Volume 2 (Instruction Set Reference). You can get the CPU cache information by CPUID insturciton with EAX equal to 2 or 4. EAX=2 is the older version, and seems like newer CPU does not use it. So I will introduct with EAX=4 case.

Its output format is:

CPUID4_1

CPUID4_2

So you can calculate the cache size with following formula:

Cache size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1) or

Cache size = (EBX[31:22] + 1) * (EBX[21:12] + 1) * (EBX[11:0] + 1) * (ECX + 1)

For example, I execute the "cpuid -li" insturction in my ubuntu system, and get the following output:

   deterministic cache parameters (4):
  --- cache 0 ---
  cache type                           = data cache (1)
  cache level                          = 0x1 (1)
  self-initializing cache level        = true
  fully associative cache              = false
  extra threads sharing this cache     = 0x1 (1)
  extra processor cores on this die    = 0x7 (7)
  system coherency line size           = 0x3f (63)
  physical line partitions             = 0x0 (0)
  ways of associativity                = 0x7 (7)
  ways of associativity                = 0x0 (0)
  WBINVD/INVD behavior on lower caches = false
  inclusive to lower caches            = false
  complex cache indexing               = false
  number of sets - 1 (s)               = 63
  --- cache 1 ---
  cache type                           = instruction cache (2)
  cache level                          = 0x1 (1)
  self-initializing cache level        = true
  fully associative cache              = false
  extra threads sharing this cache     = 0x1 (1)
  extra processor cores on this die    = 0x7 (7)
  system coherency line size           = 0x3f (63)
  physical line partitions             = 0x0 (0)
  ways of associativity                = 0x7 (7)
  ways of associativity                = 0x0 (0)
  WBINVD/INVD behavior on lower caches = false
  inclusive to lower caches            = false
  complex cache indexing               = false
  number of sets - 1 (s)               = 63
  --- cache 2 ---
  cache type                           = unified cache (3)
  cache level                          = 0x2 (2)
  self-initializing cache level        = true
  fully associative cache              = false
  extra threads sharing this cache     = 0x1 (1)
  **extra processor cores on this die    = 0x7 (7)
  system coherency line size           = 0x3f (63)
  physical line partitions             = 0x0 (0)**
  ways of associativity                = 0x3 (3)
  ways of associativity                = 0x0 (0)
  WBINVD/INVD behavior on lower caches = false
  inclusive to lower caches            = false
  complex cache indexing               = false
  number of sets - 1 (s)               = 1023
  --- cache 3 ---
  cache type                           = unified cache (3)
  cache level                          = 0x3 (3)
  self-initializing cache level        = true
  fully associative cache              = false
  extra threads sharing this cache     = 0xf (15)
  extra processor cores on this die    = 0x7 (7)
  system coherency line size           = 0x3f (63)
  physical line partitions             = 0x0 (0)
  ways of associativity                = 0xb (11)
  ways of associativity                = 0x6 (6)
  WBINVD/INVD behavior on lower caches = false
  inclusive to lower caches            = true
  complex cache indexing               = true
  number of sets - 1 (s)               = 12287

L1 data cache size is: (7+1)(0+1)(63+1)*(63+1)=32K

L3 cache size is: (11+1)(0+1)(63+1)*(12287+1)=9M

Upvotes: 3

Marat Dukhan basically gave you the right answer. For newer Intel processors, meaning those made in the last 5-6 years, the best solution is to enumerate over the cpuid leaf 4, meaning you call cpuid a few times, first with EAX=4 and ECX=0, then with EAX=4 and ECX=1 and so forth. This will return info not only on the cache sizes and types but also tell you how these caches are connected to the CPU cores and hyperthreading/SMT units. The algorithm and sample code is given at https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ , more specifically in the section titled "Cache Topology Enumeration".

Upvotes: 4

Related Questions