D-RAJ
D-RAJ

Reputation: 3382

Does random indexing an array have any performance implications over sequential indexing?

Lets say that we have a long array named DataList. And we have two other arrays which contains indexes, one contains the indexes in the ascending order (ie: 0, 1, 2, 3, 4, ...) named sIndexes, the other array is made of indexes which are randomly packed (ie: 6, 5, 1, 9, 7, ...) named rIndexes.
These index arrays (sIndexes and rIndexes) are used to index the DataList array. When we use the sIndex array to index DataList, it indexes the elements sequentially. When we use rIndexes to index the DataList, it indexes at random places in the array.

So my question is,
Is there any performance differences when using random indexing over sequential indexing? Isn't cache misses gonna contribute in performance loss (like if the index is pointing to a location which is not available in the cache line)?

Upvotes: 2

Views: 886

Answers (2)

maxy
maxy

Reputation: 5467

Linear access is much faster, because:

  • The minimum size to fetch from RAM is one cache-line (usually 64 bytes). If you only use one byte from that, you are wasting precious memory bandwidth.
  • Modern CPUs can detect regular access patterns and will pre-fetch data from RAM, so it will be cached before you even access it. (Or at least it will already be streaming at maximum memory bandwidth.)
  • If linear access is detected at compile-time (probably not the case for your code), it can be vectorized into SIMD instructions, processing many items at once.

Random access will be much slower, unless your whole array fits into L2 cache, or you do a lot of work per item. A L2 cache miss is usually quite expensive.

For details I recommend the classic What Every Programmer Should Know About Memory[1], a long and fascinating read. It's from 2007 but CPU architecture has not fundamentally changed; if anything this has become more relevant.

The above is true for modern, large CPUs. Small embedded CPUs exist that have no cache but predictable SRAM access. In this case random indexing will perform the same.

[1] https://www.gwern.net/docs/cs/2007-drepper.pdf

Upvotes: 1

Ted Lyngmo
Ted Lyngmo

Reputation: 117318

What happens "under the hood" is likely a multiplication (index * sizeof(data)) and that takes the same amount of time for the types involved no matter what the values are.

As you yourself point out, actually fetching the value from the array may actually take longer time if the part of the array that is accessed isn't in the current cacheline.

For a single threaded application, one usually wants to pack the data tightly to promote cache hits (see the idea behind std::hardware_constructive_interference_size) but for multithreaded applications one often tries to keep the elements apart (std::hardware_destructive_interference_size) to avoid false sharing and hopefully let each thread have its cacheline undisturbed by the others.

Upvotes: 1

Related Questions