Are cache misses correlated to whether we use heap memory?

Question

To achieve best performance, try to minimize cache misses. I guess we can all agree on that.

What I suggest and would like to ask about is the following. I say that this:

template 
struct {
    T* a;
    T* b;
    T* c;
};

is more vulnerable to cache misses than this:

template 
struct {
    T a;
    T b;
    T c;
};

Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?

Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.

I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?

Matthieu Brucher · Accepted Answer

On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.

The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).

So the question is, is your struct a struct of say 3 float that get allocated on the heap, or arrays of float? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.

The 3 ground rules are locality, locality, locality.

Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).

Are cache misses correlated to whether we use heap memory?

Answers (1)

Related Questions