Reputation: 11
I have the following piece of code which scans a 3-D structure histMem
where each of the 64x64 elements contains an array of 65536 elements representing a histogram.
The goal is to find the location of the histogram bin with the highest counts.
int maxVal, maxLoc;
for (int r = 0; r < 64; r++) { //scan over 64 rows
for (int c = 0; c < 64; c++) { //scan over 64 columns
maxVal = histMem[r][c][0];
maxLoc = 0;
for (int p = 0; p < nBins; p++) { //scan over 65536 histogram bins
if (histMem[r][c][p]> maxVal) { //update the max location and max value if needed
maxVal = histMem[r][c][p];
maxLoc = p;
}
}
}
}
The variable histMem
has been declared in such a way:
unsigned int*** histMem;
and the memory allocation is done using the following function:
histMem = createArrayMem(64,64,65536);
Specifically, this is what the function createArrayMem
does:
unsigned int*** createArrayMem(int hSize, int vSize, int depth) {
unsigned int*** arrayMem = new unsigned int** [hSize];
for (int i = 0; i < hSize; i++) {
// Allocate memory blocks for rows of each 2D array
arrayMem[i] = new unsigned int* [vSize];
for (int j = 0; j < vSize; j++) {
// Allocate memory blocks for columns of each 2D array
arrayMem[i][j] = new unsigned int[depth];
}
}
return arrayMem;
}
Now the problem is that finding the histogram peak for each of the 64x64 arrays of histMem
is extremely slow, it takes around 500 milliseconds to do the task.
Is there a way to make this simple operation faster?
Thank you all.
Upvotes: 1
Views: 1023
Reputation: 30860
I think what you are asking for is not realistic.
Your entire data structure is ~ 1GB. Scanning the entire thing 15 times per second requires 15GB/s memory bandwidth. This is on the upper end of what DDR3 supports, and well into mid-range territory for DDR4.
Furthermore, to achieve your stated goal of completing in 1-2 ms, you need to read that much data in that amount of time. Does your computer have a 1TB/s or 500GB/s memory bus?
Upvotes: 5