Reputation: 23
I have a loop and inside a have a inner loop. How can I optimise it please in order to optimise execution time like avoiding accessing to memory many times to the same thing and avoid the maximum possible the addition and multiplication.
int n,m,x1,y1,x2,y2,cnst;
int N = 9600;
int M = 1800;
int temp11,temp12,temp13,temp14;
int temp21,temp22,temp23,temp24;
int *arr1 = new int [32000]; // suppose it's already filled
int *arr2 = new int [32000];// suppose it's already filled
int sumFirst = 0;
int maxFirst = 0;
int indexFirst = 0;
int sumSecond = 0;
int maxSecond = 0;
int indexSecond = 0;
int jump = 2400;
for( n = 0; n < N; n++)
{
temp14 = 0;
temp24 = 0;
for( m = 0; m < M; m++)
{
x1 = m + cnst;
y1 = m + n + cnst;
temp11 = arr1[x1];
temp12 = arr2[y1];
temp13 = temp11 * temp12;
temp14+= temp13;
x2 = m + cnst + jump;
y2 = m + n + cnst + jump;
temp21 = arr1[x2];
temp22 = arr2[y2];
temp23 = temp21 * temp22;
temp24+= temp23;
}
sumFirst += temp14;
if (temp14 > maxFirst)
{
maxFirst = temp14;
indexFirst = m;
}
sumSecond += temp24;
if (temp24 > maxSecond)
{
maxSecond = temp24;
indexSecond = n;
}
}
// At the end we use sum , index and max for first and second;
Upvotes: 0
Views: 108
Reputation: 13679
You are multiplying array elements and accumulating the result. This can be optimized by:
Look for CPU-specific SIMD way of doing this. Like _mm_mul_epi32
from SSE4.1 can possibly be used on x86-64. Before trying to write your own SIMD version with compiler intrinsics, make sure the compiler doesn't do it already for you.
As for parallel execution, look into omp, or using C++17 parallel accumulate.
Upvotes: 2