Elden Abob
Elden Abob

Reputation: 623

Why is this sequential array loop slower than a loop that uses a "lookup" array?

I've been studying cache locality recently and I'm trying to understand how CPUs access memory. I wrote an experiment to see if there was a performance difference when looping an array sequentially vs. using a lookup table of some sort to index into the data array. I was surprised to find the lookup method slightly faster. My code is below. I compiled with GCC on Windows (MinGW).

#include <stdlib.h>
#include <stdio.h>
#include <windows.h>

int main()
{
    DWORD dwElapsed, dwStartTime;

    //random arrangement of keys to lookup
    int lookup_arr[] = {0, 3, 8, 7, 2, 1, 4, 5, 6, 9};

    //data for both loops
    int data_arr1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int data_arr2[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    //first loop, sequential access
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr1[i]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Normal loop completed: %d\n", dwElapsed);

    //second loop, indexes into data_arr2 using the lookup array
    dwStartTime = GetTickCount();
    for (int n = 0; n < 9000000; n++) {
        for (int i = 0; i < 10; i++)
            data_arr2[lookup_arr[i]]++;
    }
    dwElapsed = GetTickCount() - dwStartTime;
    printf("Lookup loop completed: %d\n", dwElapsed);

    return 0;
}

Running this, I get:

Normal loop completed: 375
Lookup loop completed: 297

Upvotes: 1

Views: 281

Answers (2)

Floris
Floris

Reputation: 46365

Following up on my earlier comments, here is how you do this kind of thing.

  1. Repeated measurements
  2. Estimate error
  3. Large memory block
  4. Randomized vs linear indices (so either way you have an indirection)

The result is a significant difference in speed with the "randomized indexing".

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>

#define N 1000000

int main(void) {
  int *rArr;
  int *rInd; // randomized indices
  int *lInd; // linear indices
  int ii;

  rArr = malloc(N * sizeof(int) );
  rInd = malloc(N * sizeof(int) );
  lInd = malloc(N * sizeof(int) );

  for(ii = 0; ii < N; ii++) {
    lInd[ii] = ii;
    rArr[ii] = rand();
    rInd[ii] = rand()%N;
  }

  int loopCount;
  int sum;
  time_t startT, stopT;
  double dt, totalT=0, tt2=0;

  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[lInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d\n", sum);
  printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), (tt2 - totalT * totalT / 100.0)/100.0 / (double)(CLOCKS_PER_SEC));

  totalT = 0; tt2 = 0;
  startT = clock();
  for(loopCount = 0; loopCount < 100; loopCount++) {
    for(ii = 0; ii < N; ii++) {
      sum += rArr[rInd[ii]];
    }
    stopT = clock();
    dt = stopT - startT;
    totalT += dt;
    tt2 += dt * dt;
    startT = stopT;
  }
  printf("sum is %d\n", sum);
  printf("total time: %lf += %lf\n", totalT/(double)(CLOCKS_PER_SEC), sqrt((tt2 - totalT * totalT / 100.0)/100.0) / (double)(CLOCKS_PER_SEC));
}

Result - the sequential access is > 2x faster (on my machine):

sum is -1444272372
total time: 0.396539 += 0.000219
sum is 546230204
total time: 0.756407 += 0.001165

With -O3 optimization, the difference is even starker - a full 3x faster:

sum is -318372465
total time: 0.142444 += 0.013230
sum is 1672130111
total time: 0.455804 += 0.000402

Upvotes: 2

Timo
Timo

Reputation: 5176

I believe you are compiling without optimizations turned on. With -O2 g++ optimizes away everything so the run time is 0, and without the flag I get similar results.

After modifying the program so that values in data_arr1 and data_arr2 are actually used for something I get 78ms for both.

Upvotes: 1

Related Questions