Reputation: 11
I'm calling intel MKL for csr format SpMV. To accelerate, I'm using multiple threads by mkl_set_num_threads. However, when the threads increases, the performance slows down. Any idea what's going on?
Though the documents say the thread number specified by mkl_set_num_threads only limits the maximum threads used, and less threads might be used, I expect the performance at least stays the same when using more threads. Here is my code:
for(int j = 1; j <= max_threads; ++j){
mkl_set_num_threads(j);
mkl_scsrmv(&transa, &m, &k, &alpha, matdescra, val, col_idx, row_idx, &row_idx[1], x, &beta, y);
start = clock();
for(int i = 0; i < 100; ++i) {
mkl_scsrmv(&transa, &m, &k, &alpha, matdescra, val, col_idx, row_idx, &row_idx[1], x, &beta, y);
}
end = clock();
elapsed = end - start;
cout << "the float CSR spmv performance is " << (double)nnz * ( (double)CLOCKS_PER_SEC / 10000000 ) / (double)elapsed << " Gflops using " << j << " threads" << endl;
}
And here is the result:
the float CSR spmv performance is 0.19734 Gflops using 20 threads
By the way, I'm using gcc to compile with -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
Any help would be appreciated. Thanks.
Upvotes: 1
Views: 223
Reputation: 96
clock()
returns the total CPU time used by the process. You should measure real-world time instead.
Upvotes: 1