Reputation: 3382
Lets say that we have a long array named DataList
. And we have two other arrays which contains indexes, one contains the indexes in the ascending order (ie: 0, 1, 2, 3, 4, ...) named sIndexes
, the other array is made of indexes which are randomly packed (ie: 6, 5, 1, 9, 7, ...) named rIndexes
.
These index arrays (sIndexes
and rIndexes
) are used to index the DataList
array. When we use the sIndex
array to index DataList
, it indexes the elements sequentially. When we use rIndexes
to index the DataList
, it indexes at random places in the array.
So my question is,
Is there any performance differences when using random indexing over sequential indexing? Isn't cache misses gonna contribute in performance loss (like if the index is pointing to a location which is not available in the cache line)?
Upvotes: 2
Views: 886
Reputation: 5467
Linear access is much faster, because:
Random access will be much slower, unless your whole array fits into L2 cache, or you do a lot of work per item. A L2 cache miss is usually quite expensive.
For details I recommend the classic What Every Programmer Should Know About Memory[1], a long and fascinating read. It's from 2007 but CPU architecture has not fundamentally changed; if anything this has become more relevant.
The above is true for modern, large CPUs. Small embedded CPUs exist that have no cache but predictable SRAM access. In this case random indexing will perform the same.
[1] https://www.gwern.net/docs/cs/2007-drepper.pdf
Upvotes: 1
Reputation: 117318
What happens "under the hood" is likely a multiplication (index * sizeof(data)
) and that takes the same amount of time for the types involved no matter what the values are.
As you yourself point out, actually fetching the value from the array may actually take longer time if the part of the array that is accessed isn't in the current cacheline.
For a single threaded application, one usually wants to pack the data tightly to promote cache hits (see the idea behind std::hardware_constructive_interference_size
) but for multithreaded applications one often tries to keep the elements apart (std::hardware_destructive_interference_size
) to avoid false sharing and hopefully let each thread have its cacheline undisturbed by the others.
Upvotes: 1