Reputation: 1224

Is there a way to avoid cache misses _completely_?

I read the very basics on how the cache works here: How and when to align to cache line size? and here: What is "cache-friendly" code? , but none of these posts answered my question: is there a way to execute some code entirely within the cache, i.e., without using any access to RAM (beyond perhaps during the initial process of reading the file from the HDD)? As far as I understand the bottleneck in computation nowadays is mostly memory bandwidth, and "as long as you are within the CPU, you are just fine".

Is there a way to load a program into the cache, and keep it there until it terminates? So let's say I have a 1MB compiled C program, which does some scientific computation with a memory requirement of another 1MB, and runs for 5 days. Is there a way to flag this code, so that it does not get out from the cache during evaluation? I am thinking of giving this code higher priority, or alike during execution.

In other words, how much cache is used by an idling computer, which loads its OS (say Ubuntu), and then does nothing? Is there excessive cache use during idling? Should I expect my small program to be always in the cache if the OS does not do anything besides executing it? Let's say after 5 minutes the screensaver starts. Does this lead to massive cache misses (and hence, drastic reduction in performance), since now it competes with my program for the cache space? My experience says that running several non-demanding programs (like the screensaver, or a simple audio player, pdf reader, etc.) at the same time does not significantly decrease the performance of my scientific program, even though I would expect that it would go in-and-out from the cache all the time. The question is: why does not it get its speed affected? Would it make sense to use an absolute minimalistic OS (if so, then which one?) to improve (or rather: maintain) the speed of the computation?

Just for clarity, we can assume that the code is something very simple, say it is a bunch of nested for loops where the innermost part sums up all the increment variables modulo 97. The point is that it is small enough to be put and executed in the cache.

Upvotes: 7

Answers (5)

Phil Miller

Reputation: 38158

Newer Intel CPUs have added "Cache Allocation Technology" (CAT) under the general rubric of their Resource Director Technology. This allows software directives to reserve certain cache (and other) resources for particular computational units (application, container, VM, etc). So, if the process in question has enough cache space set aside for it under CAT, it should experience only its initial compulsory misses (to bring its code and data into cache) and self-induced conflict misses, avoiding capacity misses and conflict misses created by other processes.

Upvotes: 2

bare_metal

Reputation: 1194

I am not sure whether it will satisfy your questions.

is there a way to execute some code entirely within the cache, i.e., without using any access to RAM? Is there a way to load a program into the cache, and keep it there until it terminates?

It is possible to use fully associative cache( for eg Tightly coupled memories), which has single cycle access times.(This is realistic only in very small embedded systems).it is a general practise to use TCM's in embedded systems for time critical code as it provides predictability.

In case of partially associative caches it is possible to lock up cache lines or ways (for eg using CP15 in ARM ), so that the eviction algorithm doesn't consider them as a victim for cache fill.

as a side note it is also useful sometimes to use Cache as Ram for Bringup of non booting boards when the caches are in debug mode. (http://www.asset-intertech.com/Products/Processor-Controlled-Test/PCT-Software/Cache-as-RAM-for-board-bring-up-of-non-boothing-ci)

Upvotes: 1

chrk

Reputation: 4215

There are different types of CPU cache misses: compulsory, conflict, capacity, coherence.

Compulsory misses can't be avoided, as they happen on the first reference to a location in memory. So no, you definitely can't avoid cache misses completely.

Besides that, typical L1 cache sizes today are 32KB/64KB per core, and L2 cache sizes are 256KB per core. So 1MB of data would also create either capacity or conflict misses, depending on cache's associativity.

Upvotes: 6

Oliver Charlesworth

Reputation: 272727

No, on most standard architectures, CPU cache is not addressable.^*

And even if you could, what kind of performance improvement are you anticipating here? What percentage of your program's execution time do you believe is being spent loading from main memory into (L3) cache? You should profile your program to determine where it's actually spending its time, rather than dreaming up solutions to problems that don't exist!

_{* I think x86 CPUs might have a hardware configuration which allows them to operate without attached RAM, but that's basically irrelevant.}

Upvotes: 4

Steve

Reputation: 11973

Short answer: NO. Cache is being maintained by the OS/CPU and it is a bad idea to allow programs to force itself to stay in cache. Lets say you got 2 programs running at the same time, and both are trying to force to stay in the cache, chaos would happen isn't it?

Upvotes: 3

Is there a way to avoid cache misses _completely_?

Answers (5)

Related Questions