Reputation: 667
I have been working with computer matrix multiplications on last months and I have made some tests using openMP and eigen3.
The tests were made on the follow machines:
Computer 1:
Intel Core i7-3610QM CPU @ 2,30GHz / 6 GB ddr3
Computer 2:
Six-Core AMD Opteron(tm) Processor 2435 2.60 GHz (2 processors) / 16 GB
For the openMP the follow matrix-matrix multiplication algorithm was used:
void matrix4openmp(void)
{
int j;
#pragma omp parallel for
for (j=0;j<N; j+=2){
double v1[N],v2[N];
int i,k;
for (i=0;i<N; i++){
v1[i]=b[i][j];
v2[i]=b[i][j+1];
}
for (i=0; i<N;i+=2){
register double s00,s01,s10,s11;
s00=s01=s10=s11=0.0;
for (k=0;k<N;k++){
s00 += a[i] [k] * v1[k];
s01 += a[i] [k] * v2[k];
s10 += a[i+1][k] * v1[k];
s11 += a[i+1][k] * v2[k];
}
c[i] [j] =s00;
c[i] [j+1] =s01;
c[i+1][j] =s10;
c[i+1][j+1] =s11;
}
}
The results were the follow:
_________________________Computer 1__________Computer 2
Sequential________232,75600___________536,21400
OpenMP____________2,75764____________7,62024
Eigen3_____________3,35090____________1,92970
*The time is in seconds.
*The matrix sizes were 2700 x 2500 and 2500 x 2700.
*The sequential algorithms isn't the same of the OMP, it's the most simple version of m-m multiplication and can be seen here: http://pastebin.com/Pc9AKAE8.
*SSE2 instructions were activated for the eigen3 tests.
*OpenMP uses the default cores, this' all the cores that windows detect including virtual ones.
As you can see the OpenMP version is faster on the first computer (i7) than the eigen3 version. However for the computer 2 (2x Opteron) the performance of eigen3 complete beats the OpenMP version plus all the tests made in the computer 1.
Any idea why I get this results and why eigen3 isn't so fast in the computer 1 as in computer 2?
Upvotes: 1
Views: 1158
Reputation: 667
Thanks for your answers.
The huge difference between the sequential and the parallel versions is due to be different algorithms being used. The sequential version uses the usual naïve O(N^3) without any optimizations, whilst the parallel versions are optimized versions – using blocks. Using the same algorithm the sequential version times are about 10 (computer 1) and 50 (for computer 2) – sorry should have put these values in the first post.
The difference between Eigen3 performance vs OpenMP performance in the first and second computer seems to be due the number of threads launched vs the number of physical processors available. We found that the performance of Eigen3 gets worse if the number of threads launched is bigger than the available number of physical processors and this is not the case for OpenMP
In the tests the number of threads launched for both cases was equal to the number total processors (virtual + physical).
In computer 1 the Eigen3 performance is worse because the number of total processors (virtual + physical - – due to hyperthreading) is greater than the number of physical processors.
In computer 2 the Eigen3 performance is better because the total number of processors is the same as the number of the physical processors. If we use the double of number of physical processors for the number of threads the performance of Eigen3 also degrades and the openMP in fact improves a little.
Upvotes: 1