Reputation: 504
I have a pointer to a vector which is stored in some other object.
vector<Thing>* m_pThings;
Then when I want to iterate through this vector, I use the following for
loop:
for (auto& aThing : *m_pThings){
aThing.DoSomething();
}
assume that Thing::DoSomething()
exists.
My question is: Does this code cause too many cache misses while de-referencing m_pThings
? I want to know this as I have purposely made a vector so that all the Thing
s are kept contiguously in memory and avoid cache misses, and want to know if de-referencing a pointer like this makes cache misses anyways ruining my effort.
Upvotes: 1
Views: 928
Reputation: 2505
I just want to say that the difference between passing a value into a for-each statement and passing a dereferenced pointer is only a one-time penalty of a few clock cycles at worst. A for-each statement works by calling the begin()
function of the object you pass in, which returns an iterator. It then does the rest of the work with just that iterator, which is identical in both cases. There's no repeated overhead of having to dereference the pointer.
On modern processors, almost any operation that you can do likely won't be too cache unfriendly, unless you're addressing memory with the pattern of some bizarre n-th degree polynomial. Even if your program is rapidly jumping between a few disparate addresses, the cache will often very easily handle such behavior, especially if some of the addresses are static, such as your m_pThings vector.
In the end, unless you're designing a pattern that will be used all throughout a sprawling program, choices like these don't really need an in-depth examination up-front. Premature optimization is evil, and should be avoided at all costs! If it proves to be an issue down the line, your profiler will alert you to the fact that this particular design is gobbling up cycles and should be re-examined.
If you really want to know though, benchmarking is your best bet. No amount of code analysis can beat a simple benchmark when it comes to questions like these, which are inherently very dependent on particular hardware implementations. Not all caches behave the same way though, so while one CPU might handle an algorithm without a sweat, another CPU might burn itself up.
So my answer basically boils down to: Do what you think is the most clear and/or convenient, because it probably doesn't actually matter. If it proves to be an issue, you'd probably be better off rethinking the algorithm itself, rather than improving cache performance. The cache should be thought of as a bonus, rather than a necessity. If your code only runs well with some particular cache behavior, you're doing it wrong*.
*The one exception I would make is if you're programming for a single specific target, like a game console or some other proprietary embedded system.
Upvotes: 5
Reputation: 19706
It's going to cause cache misses the first time you go over it, and if there's too much time between that to the next reuse of the same vector - it may get flushed out of the cache.
The benefit of contiguous storage comes mostly from 2 things - if the elements are smaller than a cacheline, the first fetch will also bring the next few items in the same go, so you're saving memory bandwidth (which translates to latency if your memory is stressed by many requests and you run out of buffers).
The second benefit is from HW prefetchers that can run ahead over contiguous memory ranges and reduce the access latency of any later demand.
You didn't specify the size of Thing
, the reuse distance of that array, or whether it's even small enough to even fit in the cache (and if so - what level) - these parameters may determine which of the above benefits applies.
Either way, the fact that you deference the array through a pointer indeed doesn't hurt it contiguous placement - the HW will eventually see the addresses accessed one after the other and be able to use that fact when caching or prefetching it. However, note it still doesn't protect you from cache thrashing.
Upvotes: 1
Reputation: 30489
For for (auto& aThing : *m_pThings)
, using vector<Thing>*
v/s vector<Thing>
would hardly change the performance because in both ways you are referring to the vector which merely contain pointers (let's assume these are start, end and end_of_allocation) to actual data and major performance imapct will be caused by the layout of vector data and not by the vector raw pointers themselves.
But for readability, you should prefer to use vector<Thing>
Upvotes: 1