stefano1
stefano1

Reputation: 171

C++ Matrix product: increased speed with little changes

I'm writing a code to multiply a sparse matrix with a full matrix.

I've created 2 class: SparseMatrix and Matrix, which store datas as a vector of shared pointers to vectors. In the SparseMatrix case i save item as an object, called SparseMatrixItem with 2 attributes: position and values. In the Matrix case I simply save the value. They can both be row or column based, by the value of a bool attribute.

Now I'm trying to write an efficient version of the standard product between the 2 matrixes. By semplicity in the first implementation I consider only the case in which the first matrix is a row based SparseMatrix and the second is a column based Matrix. I write the code into the class SparseMatrix by overloading the operator *.

I post my implementation:

template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
    Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
    int numberOfRow=newMatrix.getNumberOfRows();
    int numberOfColumn=newMatrix.getNumberOfColumns();

    for (int i=0; i<numberOfRow; i++) {
    vector<SparseMatrixItem<scalar>>& readRow(*horizontalVectorMatrix[i]);
    vector<scalar>& writeRow(*newMatrix.internalMatrix[i]);

        for (int j=0; j<numeroColonne; j++) {
            vector<scalar>& readColumn1(*input2.internalMatrix[j]);
            writeRow[j]=fastScalarProduct(readRow, readColumn1);

        }
    }
}

The strange fact I cannot figure out is that if I change the 2 loop order performance are dramatically faster. I test it with 2 matrix: 6040x4000 and 4000*6040, the first implementation tooks nearly 30 seconds,while the second implementation tooks only 12 seconds. I post it:

template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
    Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
    int numberOfRow=newMatrix.getNumberOfRows();
    int numeroColonne=newMatrix.getNumberOfColumns();

    for (int j=0; j<numeroColonne; j++) {
        vector<scalar>& readColumn(*input2.internalMatrix[j]);
        vector<scalar>& writeColumn(*newMatrix.internalMatrix[j]);

        for (int i=0; i<numberOfRow; i++) {
            vector<SparseMatrixItem<scalar>>& readRow(*matriceOrizzontaleVettori[i]);
            writeColumn[i]=fastScalarProduct(readRow, readColumn);
        }
    }
}

I post also the code of the function fastScalarProduct() that I use:

template <typename scalar>
scalar SparseVectorMatrix<scalar>::fastScalarProduct
    ( vector<SparseMatrixItem<scalar>> &vector1
    , const vector<scalar> &vector2
    ) {
    int totalSum=0;
    int position;
    auto sizeVector1=vector1.size();

    for (int i=0; i<sizeVector1; i++) {
        position=vector1[i].position-1;
        if (vector2[position]) {
            totalSum+=(vector1[i].value)*vector2[position];
        }
    }
    return totalSum;
}

I try the same product with MATLAB and it takes only 1.5 seconds more or less. I think that there are issues with cache memory, but since I'm new to this kind of problems I cannot figure out the real problem.

I'm also trying to write an efficient full matrix product, and I'm facing the same problems.

Upvotes: 1

Views: 120

Answers (1)

Laurent Huot
Laurent Huot

Reputation: 11

You are right in saying the "issue" is with cache memory. I suggest you read about locality of reference (http://en.wikipedia.org/wiki/Locality_of_reference) which explains why your program runs faster when the loop with the most iterations is inside the one with less iterations. Basically, arrays are linear data sctructures, and they make great use of spatial locality.

As for the time it took to run the algorithm in matlab vs C++, I suggest you read this post: Why is MATLAB so fast in matrix multiplication?

Upvotes: 1

Related Questions