Reputation: 26356
To achieve best performance, try to minimize cache misses. I guess we can all agree on that.
What I suggest and would like to ask about is the following. I say that this:
template <typename T>
struct {
T* a;
T* b;
T* c;
};
is more vulnerable to cache misses than this:
template <typename T>
struct {
T a;
T b;
T c;
};
Often I make the argument: Minimize heap allocations to minimize cache misses. Am I wrong about this?
Rationale: From my work emulators (I wrote the emulator of a PowerPC including the MMU): Memory is pulled in pages or blocks. If you allocate everything on the stack, the compiler will have a better chance in getting everything in a contiguous memory chunk, which means that pulling a single page/block will contain your whole struct/class (assuming you're not using gigantic structs/classes), and hence you'll have less cache misses.
I don't fully understand cache-lines in modern CPUs when people mention them (and I don't know whether it refers to simply the page table walk process on multiple cache levels). Some people told me my argument is incorrect for that, and I didn't understand what they meant. Can someone please tell me whether my argument is correct/incorrect and whether it's wrong for a particular reason in x86/x64 architectures?
Upvotes: 1
Views: 1362
Reputation: 22023
On the stack or on the heap, you will get cache misses. So no, it's not about minimizing heap allocations.
The question is how can the processor reuse cache information as much as possible and predict where you want to go. That's why a vector
is better than a list, because you are going through your data in a predictable manner (compared to a list or a map).
So the question is, is your struct
a struct of say 3 float
that get allocated on the heap, or arrays of float
? If it's the first, then bad, use the data itself rather than pointers, if it's the latter, good, you have locality if you loop over each array.
The 3 ground rules are locality, locality, locality.
Then there is the whole discussion about Arrays of Structures (AoS), Structures of Arrays (SoA, usually better when not all entries are useful for computations) and Arrays of Structures of Arrays (AoSoA, with vectorized code, the last arrays would be packed floats/integers...).
Upvotes: 3