Reputation: 23
I am developing a program, making heavy use of Armadillo library. I have the 10.8.2 version, linked against Intel oneAPI MKL 2022.0.2. At some point, I need to perform many sparse matrix times dense vector multiplications, both of which are defined using Armadillo structures. I have found this point to be a probable bottleneck, and was being curious if replacing the Armadillo multiplication with "bare bones" sparse CBLAS routines from MKL (mkl_sparse_d_mv) would speed things up. But in order to do so, I need to convert from Armadillo's SpMat to something that MKL understands. As per Armadillo docs, sparse matrices are stored in CSC format, so I have tried mkl_sparse_d_create_csc. My attempt at this is below:
#include <iostream>
#include <armadillo>
#include "mkl.h"
int main()
{
arma::umat locations = {{0, 0, 1, 3, 2},{0, 1, 0, 2, 3}};
// arma::vec vals = {0.5, 2.5, 2.5, 4.5, 4.5};
arma::vec vals = {0.5, 2.5, 3.5, 4.5, 5.5};
arma::sp_mat X(locations, vals);
std::cout << "X = \n" << arma::mat(X) << std::endl;
arma::vec v = {1,1,1,1};
arma::vec v2;
v2.resize(4);
std::cout << "v = \n" << v << std::endl;
std::cout << "X * v = \n" << X * v << std::endl;
MKL_INT *cols_beg = static_cast<MKL_INT *>(mkl_malloc(X.n_cols * sizeof(MKL_INT), 64));
MKL_INT *cols_end = static_cast<MKL_INT *>(mkl_malloc(X.n_cols * sizeof(MKL_INT), 64));
MKL_INT *row_idx = static_cast<MKL_INT *>(mkl_malloc(X.n_nonzero * sizeof(MKL_INT), 64));
double *values = static_cast<double *>(mkl_malloc(X.n_nonzero * sizeof(double), 64));
for (MKL_INT i = 0; i < X.n_cols; i++)
{
cols_beg[i] = static_cast<MKL_INT>(X.col_ptrs[i]);
cols_end[i] = static_cast<MKL_INT>((--X.end_col(i)).pos());
std::cout << cols_beg[i] << " --- " << cols_end[i] << std::endl;
}
std::cout << std::endl;
for (MKL_INT i = 0; i < X.n_nonzero; i++)
{
row_idx[i] = static_cast<MKL_INT>(X.row_indices[i]);
values[i] = X.values[i];
std::cout << row_idx[i] << " --- " << values[i] << std::endl;
}
std::cout << std::endl;
sparse_matrix_t X_mkl = NULL;
sparse_status_t res = mkl_sparse_d_create_csc(&X_mkl, SPARSE_INDEX_BASE_ZERO,
X.n_rows, X.n_cols, cols_beg, cols_end, row_idx, values);
if(res == SPARSE_STATUS_SUCCESS) std::cout << "Constructed mkl representation of X" << std::endl;
matrix_descr dsc;
dsc.type = SPARSE_MATRIX_TYPE_GENERAL;
sparse_status_t stat = mkl_sparse_d_mv(SPARSE_OPERATION_NON_TRANSPOSE, 1.0, X_mkl, dsc, v.memptr(), 0.0, v2.memptr());
std::cout << "Multiplication status = " << stat << std::endl;
if(stat == SPARSE_STATUS_SUCCESS)
{
std::cout << "Calculated X*v via mkl" << std::endl;
std::cout << v2;
}
mkl_free(cols_beg);
mkl_free(cols_end);
mkl_free(row_idx);
mkl_free(values);
mkl_sparse_destroy(X_mkl);
return 0;
}
I am compiling this code with (with the help of Link Line Advisor)
icpc -g testing.cpp -o intel_testing.out -DARMA_ALLOW_FAKE_GCC -O3 -xhost -Wall -Wextra -L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -DMKL_ILP64 -qmkl=parallel -larmadillo
on Pop!_OS 21.10.
It compiles and runs without any problems. The output is as follows:
X =
0.5000 2.5000 0 0
3.5000 0 0 0
0 0 0 5.5000
0 0 4.5000 0
v =
1.0000
1.0000
1.0000
1.0000
X * v =
3.0000
3.5000
5.5000
4.5000
0 --- 1
2 --- 2
3 --- 3
4 --- 4
0 --- 0.5
1 --- 3.5
0 --- 2.5
3 --- 4.5
2 --- 5.5
Constructed mkl representation of X
Multiplication status = 0
Calculated X*v via mkl
0.5000
0
0
0
As we can see, the result of Armadillo's multiplication is correct, whereas the one from MKL is wrong. My question is this: Am I making a mistake somewhere? Or is there something wrong with MKL?. I suspect the former of course, but after spending considerable amount of time, cannot find anything. Any help would be much appreciated!
As CJR and Vidyalatha_Intel suggested, I have changed col_end to
cols_end[i] = static_cast<MKL_INT>((X.end_col(i)).pos());
The result is now
X =
0.5000 2.5000 0 0
3.5000 0 0 0
0 0 0 5.5000
0 0 4.5000 0
v =
1.0000
1.0000
1.0000
1.0000
X * v =
3.0000
3.5000
5.5000
4.5000
0 --- 2
2 --- 3
3 --- 4
4 --- 5
0 --- 0.5
1 --- 3.5
0 --- 2.5
3 --- 4.5
2 --- 5.5
Constructed mkl representation of X
Multiplication status = 0
Calculated X*v via mkl
4.0000
2.5000
0
0
col_end is indeed 2,3,4,5 as suggested, but the result is still wrong.
Upvotes: 2
Views: 1160
Reputation: 462
Yes, the cols_end array is incorrect as pointed out by CJR. They should be indexed as 2,3,4,5. Please see the documentation regarding the parameter to the function mkl_sparse_d_create_csc
cols_end:
This array contains col indices, such that cols_end[i] - ind - 1 is the last index of col i in the arrays values and row_indx. ind takes 0 for zero-based indexing and 1 for one-based indexing.
Change this line
cols_end[i] = static_cast<MKL_INT>((--X.end_col(i)).pos());
to
cols_end[i] = static_cast<MKL_INT>((X.end_col(i)).pos());
Now recompile and run the code. I've tested it and it is showing the correct results. Image with results and compilation command
Upvotes: 2