Reputation: 1664
I performed a small test to determine behavior of accessing a vector of pointers vs vector of values. It turns out that for small memory blocks both perform equally well, however, for large memory blocks there is a significant difference.
What is the explanation for such behavior?
For the code below, performed on my pc, the difference for D=0 is about 35% and for D=10 it is unnoticeable.
int D = 0;
int K = 1 << (22 - D);
int J = 100 * (1 << D);
int sum = 0;
std::vector<int> a(K);
std::iota(a.begin(), a.end(), 0);
long start = clock();
for (int j = 0; j < J; ++j)
for (int i = 0; i < a.size(); ++i)
sum += a[i];
std::cout << double(clock() - start) / CLOCKS_PER_SEC << " " << sum << std::endl;
sum = 0;
std::vector<int*> b(a.size());
for (int i = 0; i < a.size(); ++i) b[i] = &a[i];
start = clock();
for (int j = 0; j < J; ++j)
for (int i = 0; i < b.size(); ++i)
sum += *b[i];
std::cout << double(clock() - start) / CLOCKS_PER_SEC << " " << sum << std::endl;
Upvotes: 2
Views: 592
Reputation: 4097
Getting data from global memory is slow, so the CPU has a small bit of really fast memory to help memory access keep up with the processor. When handling memory requests, your computer will try and speed up future requests to an single integer or pointer in memory by requesting a whole bunch of them around the location you requested and storing them in cache. Once that fast memory is full is has to get rid of its least favorite bit whenever something new is requested.
Your small problems may fit entirely or substantially in cache and so memory access is super fast. Large problems can't fit in this fast memory so you have a problem. The vector is stored as K consecutive memory locations. When you access a vector of int
it loads the int and a handful of his nearby values these can be used right away. However, when you load an int*
it loads a pointer to an actual value as well as several other pointers. This takes up some memory. Then, when you dereference with *
it loads the actual value and possibly some actual values nearby. This takes up more memory. Not only do you have to perform more work, but you also are filling up memory faster. The actual increase in time will vary as it is highly dependent on the architecture, operation (in this case +
), and memory speeds. Also, your compiler will work quite hard to minimize the delays.
Upvotes: 1