Reputation: 991
I am trying to compare matrix multiplication performance of eigen using C++ and numpy.
Here is c++ code for matrix multiplication
#include<iostream>
#include <Eigen/Dense>
#include <ctime>
#include <iomanip>
using namespace Eigen;
using namespace std;
int main()
{
time_t begin,end;
double difference=0;
time (&begin);
for(int i=0;i<500;++i)
{
MatrixXd m1 = MatrixXd::Random(500,500);
MatrixXd m2 = MatrixXd::Random(500,500);
MatrixXd m3 = MatrixXd::Zero(500,500);
m3=m1*m2;
}
time (&end);
difference = difftime (end,begin);
std::cout<<"time = "<<std::setprecision(10)<<(difference/500.)<<" seconds"<<std::endl;
return 0;
}
compiling using g++ -Wall -Wextra -I "path-to-eigen-directory" prog5.cpp -o prog5 -O3 -std=gnu++0x
Output:
time = 0.116 seconds
Here is python code.
import timeit
import numpy as np
start_time = timeit.default_timer()
for i in range(500):
m1=np.random.rand(500,500)
m2=np.random.rand(500,500)
m3=np.zeros((500,500))
m3=np.dot(m1,m2)
stop_time = timeit.default_timer()
print('Time = {} seconds'.format((stop_time-start_time)/500))
Output:
Time = 0.01877937281645333 seconds
It looks like C++ code is 6 times slower as compared to python. Can someone give insights whether I am missing here anything?
I am using Eigen 3.3.4, g++ compiler (MinGW.org GCC-6.3.0-1) 6.3.0, python 3.6.1, numpy 1.11.3. Python running with spyder ide. Using Windows.
Update:
As per answer and comments, I updated the code.
C++ code compiled with g++ -Wall -Wextra -I "path-to-eigen-directory" prog5.cpp -o prog5 -O3 -std=gnu++0x -march=native
. I couldn't get -fopenmp
to work - there seems no output if I use this flag.
#include<iostream>
#include <Eigen/Dense>
#include <ctime>
#include <iomanip>
using namespace Eigen;
using namespace std;
int main()
{
time_t begin,end;
double difference=0;
time (&begin);
for(int i=0;i<10000;++i)
{
MatrixXd m1 = MatrixXd::Random(500,500);
MatrixXd m2 = MatrixXd::Random(500,500);
MatrixXd m3 = MatrixXd::Zero(500,500);
m3=m1*m2;
}
time (&end); // note time after execution
difference = difftime (end,begin);
std::cout<<"Total time = "<<difference<<" seconds"<<std::endl;
std::cout<<"Average time = "<<std::setprecision(10)<<(difference/10000.)<<" seconds"<<std::endl;
return 0;
}
Output:
Total time = 328 seconds
Average time = 0.0328 seconds
Python code:
import timeit
import numpy as np
start_time = timeit.default_timer()
for i in range(10000):
m1=np.random.rand(500,500)
m2=np.random.rand(500,500)
m3=np.zeros((500,500))
m3=np.dot(m1,m2)
stop_time = timeit.default_timer()
print('Total time = {} seconds'.format(stop_time-start_time))
print('Average time = {} seconds'.format((stop_time-start_time)/10000))
Running with runfile('filename.py')
command using spyder IDE.
Output:
Total time = 169.35587796526667 seconds
Average time = 0.016935587796526666 seconds
Now the performance with eigen is better, but not equal to or faster than numpy. May be -fopenmp
can do the trick, but not sure. However, I am not using any parallelization in numpy, unless it is doing that implicitly.
Upvotes: 1
Views: 1305
Reputation: 29205
There are several issues with your benchmark:
rand()
function which is very costly! -march=native
to get AVX/FMA boosts-fopenmp
to enable multithreading.On my quad i7 2.6GHz CPU I get:
initial code: 0.024s
after replacing `Random` by `Ones`: 0.018s
adding `-march=native`: 0.006s
adding `-fopenmp`: 0.003s
The matrix is a bit too small to get good multithreading benefits.
Upvotes: 2