Can caches keep data from multiple processes?

Question

We know that cache uses virtual addresses. So, how does this work when multiple processes are involved, especially for the shared caches, such as shared L2 cache, or even for a local L1 cache, when processes are switched, as in simultaneous (hyper) multithreading, you could have threads from two different processes running on the same physical core. Is hyperthreading any good when threads from different processes are involved, or can it only boost performance when threads of the same process are involved?

Peter Cordes · Accepted Answer

None of the major x86 CPU microarchitectures use virtually-addressed caches. They all use virtually-indexed / physically-tagged (VIPT) L1 caches. VIPT is a performance hack that allows tags from a set to be fetched in parallel with a TLB lookup.

The bits of the address that are used as the index are the same in the physical and virtual addresses. (i.e. they're part of the offset within a 4k page, so they don't need to be translated by the TLB). This means that it effectively behaves exactly like a phys/phys (PIPT) cache, avoiding all problems of virtual addressing.

This is made possible by keeping the cache small and having enough ways. Intel's L1 caches are 32kiB, 8-way associativity, with 64B lines. This accounts for all the within-a-page address bits. (see other resources for diagrams and more detailed explanations.)

Hyperthreading works fine with separate processes, because x86 CPUs avoid cache aliasing (synonym / homonym problems). They work like physically-addressed caches. Two memory-intensive processes that don't share any memory might run slower with hyperthreading than without, though. Competitive sharing of the caches can be worse than just running one process after the other finishes, if that's an option.

For processes that bottleneck on something other than a resource that hyperthreading shares, HT certainly helps. e.g. with branch mispredicts. Also with cache misses due to unpredictable access to a big working set that would still miss often without hyperthreading.

CPUs that use virt/virt caches do need to invalidate them on context switches, or have extra tags to keep track of which PID they were for. This is like what caches currently do to support virtualization: they're tagged with VM IDs, so they know which VM's physical address it's for. virt/virt L1 means you don't need a fast TLB: it's only needed on L1 misses, so the L1 cache is also caching translations.

Some designs must use phys/phys L1, but I don't know any specific examples. The virt/phys trick is pretty common in high-performance CPUs, because an L1 with enough ways to make it possible is just a good idea anyway.

Note that only L1 ever uses virt addresses. Big L2 and L3 caches are always phys/phys.

Can caches keep data from multiple processes?

Answers (1)

Related Questions