Why is this sequential array loop slower than a loop that uses a "lookup" array?

Question

I've been studying cache locality recently and I'm trying to understand how CPUs access memory. I wrote an experiment to see if there was a performance difference when looping an array sequentially vs. using a lookup table of some sort to index into the data array. I was surprised to find the lookup method slightly faster. My code is below. I compiled with GCC on Windows (MinGW).

#include 
#include 
#include 

int main()
{
    DWORD dwElapsed, dwStartTime;

    //random arrangement of keys to lookup
    int lookup_arr[] = {0, 3, 8, 7, 2, 1, 4, 5, 6, 9};

    //data for both loops
    int data_arr1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int data_arr2[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    //first loop, sequential access
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr1[i]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Normal loop completed: %d
", dwElapsed);

    //second loop, indexes into data_arr2 using the lookup array
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr2[lookup_arr[i]]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Lookup loop completed: %d
", dwElapsed);

    return 0;
}

Running this, I get:

Normal loop completed: 375
Lookup loop completed: 297

Floris · Accepted Answer

Following up on my earlier comments, here is how you do this kind of thing.

Repeated measurements
Estimate error
Large memory block
Randomized vs linear indices (so either way you have an indirection)

The result is a significant difference in speed with the "randomized indexing".

#include 
#include 
#include 
#include 

#define N 1000000

int main(void) {
  int *rArr;
  int *rInd; // randomized indices
  int *lInd; // linear indices
  int ii;

  rArr = malloc(N * sizeof(int) );
  rInd = malloc(N * sizeof(int) );
  lInd = malloc(N * sizeof(int) );

  for(ii = 0; ii < N; ii++) {
    lInd[ii] = ii;
    rArr[ii] = rand();
    rInd[ii] = rand()%N;
  }

  int loopCount;
  int sum;
  time_t startT, stopT;
  double dt, totalT=0, tt2=0;

  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[lInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d
", sum);
  printf("total time: %lf += %lf
", totalT/(double)(CLOCKS_PER_SEC), (tt2 - totalT * totalT / 100.0)/100.0 / (double)(CLOCKS_PER_SEC));

  totalT = 0; tt2 = 0;
  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[rInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d
", sum);
  printf("total time: %lf += %lf
", totalT/(double)(CLOCKS_PER_SEC), sqrt((tt2 - totalT * totalT / 100.0)/100.0) / (double)(CLOCKS_PER_SEC));
}

Result - the sequential access is > 2x faster (on my machine):

sum is -1444272372
total time: 0.396539 += 0.000219
sum is 546230204
total time: 0.756407 += 0.001165

With -O3 optimization, the difference is even starker - a full 3x faster:

sum is -318372465
total time: 0.142444 += 0.013230
sum is 1672130111
total time: 0.455804 += 0.000402

Why is this sequential array loop slower than a loop that uses a "lookup" array?

Answers (2)

Related Questions

Why is this sequential array loop slower than a loop that uses a &quot;lookup&quot; array?

Answers (2)

Related Questions

Why is this sequential array loop slower than a loop that uses a "lookup" array?