dilogical.cyclolith
dilogical.cyclolith

Reputation: 39

How and where are instructions cached?

If i were to run a binary executable program, do all of the instructions have to be stored in the instruction cache? Is the instruction cache a location in one of the L1, L2, or L3 CPU caches or is it a different entity? I am not very clear on what physically happens when i run an executable and any sort of clarity would be greatly appreciated.

Upvotes: 1

Views: 1789

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 363932

Your executable on disk is logically memory-mapped into the virtual address space of a new process.

As usually with virtual memory, touching a 4k pages of the executable may trigger a page fault if the data isn't actually in DRAM (hard page fault) or if it is but the HW page tables haven't "wired" that page. Soft page faults might still happen on each repeated runs because OSes are typically "lazy" about wiring up page tables, even when file data is in the kernel's pagecache.

(x86 uses 4kiB virtual memory pages. It's a common page size but some other ISA do or can use other page sizes like 8k or 16k.)

The kernel may optimize by making sure at least the page containing the process entry point is loaded from disk and wired up before entering user-space. Otherwise it will just page-fault right away back to the kernel. e.g. very early Linux, like 0.11, worked this way, as a get-it-working step in development before making it good.


When data is in DRAM, execution works by the CPU doing code-fetch through L1i / L2 / L3 cache. This is exactly like data loads, but goes through L1 I-cache instead of L1 D-cache, assuming a machine with split L1 caches. Outer levels of cache (assuming they exist) are almost always unified. Modern x86 CPUs and other high-end chips like POWER do typically have 3 levels of cache, usually L1i/d and L2 are per-core private, with a large shared L3. Otherwise you might have just private L1i/L1d and shared L2, or in a single-core system maybe just 1 or 2 levels of cache.

These caches have a line size of 64 bytes on most CPUs (including all x86 CPUs since P4 / Core 2 or so). Cache misses don't fault, the core just has to wait for the line to arrive. If there's any out-of-order execution still not done, it can still be happening while a code-cache miss is in flight. But otherwise the CPU is stuck with nothing at all to do.

(TLB misses are also a thing. Most ISAs have hardware page-walk that makes it transparent to software, but an iTLB miss can similarly stall instruction fetch leaving the CPU without any queued up work to do. So i-cache / i-TLB misses are even worse than data-cache / d-TLB misses where miss-under-miss / hit-under-miss, and out-of-order execution, can let some useful work continue.)


In modern x86 CPUs, decoded instructions are cached in a small, very fast "uop cache", as well as caching the machine-code bytes in L1i cache.

Pentium 4 didn't have an L1i cache, only a trace cache, but that didn't work out very well. And it didn't have the transistor or power budget for enough decoders to build traces fast on trace-cache misses. This was one of several major downsides of the NetBurst microarchitecture.

Upvotes: 4

0x4d45
0x4d45

Reputation: 722

The instructions are stored permanently in the binary executable file on your mass storage device. When the program is executed, the contents of the executable file are loaded into primary memory (RAM). Generally speaking, all traffic between the CPU and RAM goes through a cache; so do the instructions contained by the executable.

A cache may contain instructions, data, or both. In the case of Intel CPUs, there are two L1 caches: L1d for data and L1i for instructions; L2 and L3 are shared. The following image illustrates this quite clearly:

Cache hierarchy

Image source

That is the short answer. If you wish to learn more, there is plenty of good material around. Ulrich Drepper's What every programmer should know about memory is a great read for one.

Upvotes: 4

Related Questions