ATenn
ATenn

Reputation: 11

Is there a way to speed up nested for loop in C/C++?

I have the following piece of code which scans a 3-D structure histMem where each of the 64x64 elements contains an array of 65536 elements representing a histogram. The goal is to find the location of the histogram bin with the highest counts.

            int maxVal, maxLoc;
            for (int r = 0; r < 64; r++) { //scan over 64 rows
                for (int c = 0; c < 64; c++) { //scan over 64 columns
                    maxVal = histMem[r][c][0];
                    maxLoc = 0;
                    for (int p = 0; p < nBins; p++) { //scan over 65536 histogram bins
                        if (histMem[r][c][p]> maxVal) { //update the max location and max value if needed
                            maxVal = histMem[r][c][p];
                            maxLoc = p;
                        }
                    }
                }
            }

The variable histMem has been declared in such a way:

unsigned int*** histMem;

and the memory allocation is done using the following function:

histMem = createArrayMem(64,64,65536);

Specifically, this is what the function createArrayMem does:

unsigned int*** createArrayMem(int hSize, int vSize, int depth) {

    unsigned int*** arrayMem = new unsigned int** [hSize];

    for (int i = 0; i < hSize; i++) {
        // Allocate memory blocks for rows of each 2D array
        arrayMem[i] = new unsigned int* [vSize];
        for (int j = 0; j < vSize; j++) {
            // Allocate memory blocks for columns of each 2D array
            arrayMem[i][j] = new unsigned int[depth];
        }
    }

    return arrayMem;
}

Now the problem is that finding the histogram peak for each of the 64x64 arrays of histMem is extremely slow, it takes around 500 milliseconds to do the task.

Is there a way to make this simple operation faster?

Thank you all.

Upvotes: 1

Views: 1023

Answers (2)

Botje
Botje

Reputation: 30860

I think what you are asking for is not realistic.

Your entire data structure is ~ 1GB. Scanning the entire thing 15 times per second requires 15GB/s memory bandwidth. This is on the upper end of what DDR3 supports, and well into mid-range territory for DDR4.

Furthermore, to achieve your stated goal of completing in 1-2 ms, you need to read that much data in that amount of time. Does your computer have a 1TB/s or 500GB/s memory bus?

Upvotes: 5

Alex
Alex

Reputation: 42

maybe you can try with parallel_for

Upvotes: -2

Related Questions