Reputation: 459
I have the following codes, they do matrix mult using openmp
lets assume a=m x n
and b=n x k
for (i = 0; i < m; i++)
for (j = 0; j < k; j++)
for (l = 0; l < n; l++)
c[i * k + j] += a[i * n + l] * b[l * k + j];
and
for (i = 0; i < m; i++)
for (j = 0; j < k; j++)
for (l = 0; l < n; l++)
c[i][j] += a[i][l] * b[l][j];
the latter runs about 1.8x times faster using g++
with -O3
. why is that? as far as I know, 2D arrays work just like my 1D code, that's why I'm confused
Upvotes: 0
Views: 973
Reputation: 6727
For me, it takes the same time.
mat-mult.cpp:
#define DIM 1000
#ifdef D2
double a[DIM][DIM], b[DIM][DIM], c[DIM][DIM];
#else
double a[DIM*DIM], b[DIM*DIM], c[DIM*DIM];
#endif
void Test() {
int i, j, l;
int k=DIM, m=DIM, n=DIM;
for (i=0; i<DIM; i++)
for (j=0; j<DIM; j++) {
#ifdef D2
a[i][j] = i + j;
b[i][j] = i - j;
c[i][j] = 0;
#else
a[i*DIM+j] = i + j;
b[i*DIM+j] = i - j;
c[i*DIM+j] = 0;
#endif
}
#ifdef D2
for (i = 0; i < m; i++)
for (j = 0; j < k; j++)
for (l = 0; l < n; l++)
c[i][j] += a[i][l] * b[l][j];
#else
for (i = 0; i < m; i++)
for (j = 0; j < k; j++)
for (l = 0; l < n; l++)
c[i * k + j] += a[i * n + l] * b[l * k + j];
#endif
}
main() {
Test();
}
Results:
[~/CPP] g++ -O3 -o mat-mult mat-mult.cpp
[~/CPP] ./mat-mult
2.248u 0.064s 0:02.36 97.4% 0+0k 0+0io 0pf+0w
[~/CPP] ./mat-mult
2.272u 0.020s 0:02.32 98.7% 0+0k 0+0io 0pf+0w
[~/CPP] g++ -DD2 -O3 -o mat-mult mat-mult.cpp
[~/CPP] ./mat-mult
2.244u 0.040s 0:02.33 97.8% 0+0k 0+0io 0pf+0w
[~/CPP] ./mat-mult
2.220u 0.032s 0:02.30 97.8% 0+0k 0+0io 0pf+0w
You should give minimal reproducible example. The minimal reproducible example is the full code which we can copypaste with mouse and run, along with compilation flags. The code should be minimal, that is, contain only what is needed to reproduce your problem. If something can be removed from your code, and the problem remains, it is not minimal.
Currently, we have no idea what is the dimensions of your arrays, or even what flags do you use besides -O3 .
Currently, your question is a classical example of a bad question. You throw some incomplete information, hoping we can fill the missing things through telepathy.
Upvotes: 1
Reputation: 320
Your question has a red-herring, so I'm going to deviate a little to the important point in all this and what is fast and slow in matrix multiplication.
First, you should not be doing matrix multiplication manually. Consider using high-performance libraries such as OpenBLAS. OpenBLAS is uses the BLAS interface, which is obnoxious, and you can avoid using it directly with libraries that provide a good and nice C++ interface/wrapper like Armadillo, where you can link to any BLAS library of your choice.
Having said that, let's move to the case where we discuss why matrix multiplication is fast or slow.
Now why the second is faster than the first, I have no idea. This is probably due to some compiler optimization that is not that obvious. One has to analyze the assembly code. However...
The reason why 2D is faster than 1D in general is because you're not doing 1D multiplication right. You're omitting a very important performance optimization, which is vectorization.
The first way should be much faster than the second way. This is because processors can put all multiplications of consecutive elements in memory in 1 instruction (consider reading this for more info). That will be the case when you ensure that the elements you're multiplying are consecutive in memory. To do that, just transpose one of the matrices before doing the multiplication.
Also there are other issues that makes 1D multiplication faster, such as cache-locality and false sharing (in multi-threaded programs).
On scientific computing of SE, this question discusses beating libraries that are very highly optimized for matrix multiplication. Of course that's highly unlikely, but there you'll find an example that probably gets close to doing that by allowing the compiler to take advantage of vectorization.
Upvotes: 1