Reputation: 46517
I'm dealing with a 2D array with the following characteristics:
const int cols = 500;
const int rows = 100;
int arr[rows][cols];
I access array arr in the following manner to do some work:
for(int k = 0; k < T; ++k) { // for each trainee
myscore[k] = 0;
for(int i = 0; i < cols; ++i) { // for each sample
for(int j = 0; j < rows; ++j) { // for each expert
myscore[k] += delta(i, anotherArray[k][i], arr[j][i]);
}
}
}
So I am worried about the array 'arr' and not the other one. I need to make this more cache-friendly and also boost the speed. I was thinking perhaps transposing the array but I wasn't sure how to do that. My implementation turns out to only work for square matrices. How would I make it work for non-square matrices?
Also, would mapping the 2D array into a 1D array boost the performance? If so, how would I do that? Finally, any other advice on how else I can optimize this... I've run out of ideas, but I know that arr[j][i] is the place where I need to make changes because I'm accessing columns by columns instead of rows by rows so that is not cache friendly at all.
Thanks, Hristo
Upvotes: 0
Views: 4508
Reputation: 24512
Yes, 1d should be faster than 2d. C and C++ arrays are always 1d (internally). When you call something like
array[row][col]
the compiler actually calculates
col + row * maxcols
and uses that as the actual index of a 1d array. You might as well do that yourself. Cycling through an entire array will be way faster, and random access will be equally fast as in a 2d array.
Upvotes: 2
Reputation: 137860
for(int i = 0; i < N; ++i) { // for each sample
for(int j = 0; j < E[i]; ++j) { // for each expert
... arr[j][i] ... // each ++j causes a large stride => poor caching
}
}
transpose the loops:
for(int j = 0; j < E[i]; ++j) { // for each expert
for(int i = 0; i < N; ++i) { // for each sample
... arr[j][i] ... // each ++i looks to the next word in memory => good
}
}
Of course, without seeing everything else in the program, I can't say if that would cause a problem. If delta
doesn't have side effects, you should be fine.
Upvotes: 1
Reputation: 383866
A general in-place matrix transposition is very difficult, but if you're okay with transposing it to another array, then it's pretty simple.
const int cols = 500;
const int rows = 100;
int arr[rows][cols];
// fill arr[][]
int arrT[cols][rows];
for (int r = 0; r < rows; r++) {
for (int c = 0; c < cols; c++) {
arrT[c][r] = arr[r][c];
}
}
Of course, depending on how you're getting arr[][]
, you can just fill arrT[][]
directly instead.
However, there may be a simpler solution of simple swapping the order of the loops.
for(int k = 0; k < T; ++k) { // for each trainee
myscore[k] = 0;
for(int j = 0; j < rows; ++j) { // for each expert
for(int i = 0; i < cols; ++i) { // for each sample
myscore[k] += delta(i, anotherArray[k][i], arr[j][i]);
}
}
}
Upvotes: 2
Reputation: 33197
You want memory accesses to be adjacent. In your case simply swap I and j when accessing arr.
Upvotes: 0