Hristo
Hristo

Reputation: 46517

optimize 2D array in C++

I'm dealing with a 2D array with the following characteristics:

const int cols = 500; 
const int rows = 100; 
int arr[rows][cols];

I access array arr in the following manner to do some work:

for(int k = 0; k < T; ++k) { // for each trainee
  myscore[k] = 0;
  for(int i = 0; i < cols; ++i) { // for each sample  
    for(int j = 0; j < rows; ++j) { // for each expert
      myscore[k] += delta(i, anotherArray[k][i], arr[j][i]);
    }   
  }
}

So I am worried about the array 'arr' and not the other one. I need to make this more cache-friendly and also boost the speed. I was thinking perhaps transposing the array but I wasn't sure how to do that. My implementation turns out to only work for square matrices. How would I make it work for non-square matrices?

Also, would mapping the 2D array into a 1D array boost the performance? If so, how would I do that? Finally, any other advice on how else I can optimize this... I've run out of ideas, but I know that arr[j][i] is the place where I need to make changes because I'm accessing columns by columns instead of rows by rows so that is not cache friendly at all.

Thanks, Hristo

Upvotes: 0

Views: 4508

Answers (4)

mingos
mingos

Reputation: 24512

Yes, 1d should be faster than 2d. C and C++ arrays are always 1d (internally). When you call something like

array[row][col]

the compiler actually calculates

col + row * maxcols

and uses that as the actual index of a 1d array. You might as well do that yourself. Cycling through an entire array will be way faster, and random access will be equally fast as in a 2d array.

Upvotes: 2

Potatoswatter
Potatoswatter

Reputation: 137860

  for(int i = 0; i < N; ++i) { // for each sample  
    for(int j = 0; j < E[i]; ++j) { // for each expert
      ... arr[j][i] ... // each ++j causes a large stride => poor caching
    }   
  }

transpose the loops:

  for(int j = 0; j < E[i]; ++j) { // for each expert
    for(int i = 0; i < N; ++i) { // for each sample  
      ... arr[j][i] ... // each ++i looks to the next word in memory => good
    }   
  }

Of course, without seeing everything else in the program, I can't say if that would cause a problem. If delta doesn't have side effects, you should be fine.

Upvotes: 1

polygenelubricants
polygenelubricants

Reputation: 383866

A general in-place matrix transposition is very difficult, but if you're okay with transposing it to another array, then it's pretty simple.

const int cols = 500; 
const int rows = 100; 

int arr[rows][cols];
// fill arr[][]

int arrT[cols][rows];
for (int r = 0; r < rows; r++) {
   for (int c = 0; c < cols; c++) {
      arrT[c][r] = arr[r][c];
   }
}

Of course, depending on how you're getting arr[][], you can just fill arrT[][] directly instead.

However, there may be a simpler solution of simple swapping the order of the loops.

for(int k = 0; k < T; ++k) { // for each trainee
  myscore[k] = 0;
  for(int j = 0; j < rows; ++j) { // for each expert
    for(int i = 0; i < cols; ++i) { // for each sample  
      myscore[k] += delta(i, anotherArray[k][i], arr[j][i]);
    }   
  }
}

Upvotes: 2

Yann Ramin
Yann Ramin

Reputation: 33197

You want memory accesses to be adjacent. In your case simply swap I and j when accessing arr.

Upvotes: 0

Related Questions