Da Kuang
Da Kuang

Reputation: 892

How does Matlab transpose a sparse matrix?

I've been wondering about this question for quite a while but cannot find a reference: How does Matlab transpose a sparse matrix so fast, given that it is stored in CSC (compressed sparse column) format?

Also its documentation verifies the efficiency of sparse matrix transposition:

To do this (accessing row by row), you can transpose the matrix, perform operations on the columns, and then retranspose the result … The time required to transpose the matrix is negligible.

Follow-up (modified as suggested by @Mikhail):

I agree with @Roger and @Milhail that setting a flag is enough for many operations such as the BLAS or sparse BLAS operations in terms of their interfaces. But it appears to me that Matlab does "actual" transposition. For example, I have a sparse matrix X with size m*n=7984*12411, and I want to scale each column and each row:

% scaling each column
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = bsxfun(@times, A, rand(1,n));
    t = t + toc(t0);
end

t = 0.023636 seconds

% scaling each row
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = bsxfun(@times, A, rand(m,1));
    t = t + toc(t0);
end

t = 138.3586 seconds

% scaling each row by transposing X and transforming back
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = A'; A = bsxfun(@times, A, rand(1,m)); A = A';
    t = t + toc(t0);
end

t = 19.5433 seconds

This result means that accessing column by column is faster than accessing row by row. It makes sense because sparse matrices are stored column by column. So the only reason for the fast speed of column scaling of X' should be that X is actually transposed to X' instead of setting a flag.

Also, if every sparse matrix is stored in CSC format, simply setting a flag cannot make X' in CSC format.

Any comments? Thanks in advance.

Upvotes: 13

Views: 3463

Answers (3)

K. Miller
K. Miller

Reputation: 131

I realize I am bit late to the game, but I thought I could help shed some light on this question. Transposing a sparse matrix is actually a straightforward task that can be accomplished in time proportional to the number of nonzero elements in the input matrix. Suppose that A is an m x n matrix stored in CSC format, i.e., A is defined by three arrays:

  1. elemsA, of length nnz(A), that stores the nonzero elements in A
  2. prowA, of length nnz(A), that stores the row indices of nonzero elements in A
  3. pcolA, of length n + 1, such that all nonzero elements in column j of A are indexed by the range [pcolA(j), pcolA(j + 1))

If B denotes the transpose of A, then our goal is to define the analogous arrays elemsB, prowB, pcolB. To do so, we use the fact that the rows of A form the columns of B. Let tmp be an array such that tmp(1) = 0 and tmp(i + 1) is the number of elements in row i of A for i = 1,...,m. Then it follows that tmp(i + 1) is the number of elements in column i of B. Hence, the cumulative sum of tmp is the same as pcolB. Now suppose that tmp has been overwritten by its cumulative sum. Then elemsB and prowB can be populated as follows

    for j = 1,...,n
        for k = pcolA(j),...,pcolA(j + 1) - 1
            prowB(tmp(prowA(k))) = j
            elemsB(tmp(prowA(k))) = elemsA(k)
            tmp(prowA(k)) = tmp(prowA(k)) + 1
        end
    end

The array tmp is used to index into prowB and elemsB when adding a new element and then is updated accordingly. Putting this altogether, we can write a mex file in C++ that implements the transpose algorithm:

#include "mex.h"
#include <vector>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {     
    // check input output
    if (nrhs != 1)
       mexErrMsgTxt("One input argument required");
    if (nlhs > 1)
       mexErrMsgTxt("Too many output arguments");

    // get input sparse matrix A
    if (mxIsEmpty(prhs[0])) { // is A empty?
        plhs[0] = mxCreateSparse(0, 0, 0, mxREAL);
        return;
    }
    if (!mxIsSparse(prhs[0]) || mxIsComplex(prhs[0])) // is A real and sparse?
       mexErrMsgTxt("Input matrix must be real and sparse");
    double* A = mxGetPr(prhs[0]);           // real vector for A
    mwIndex* prowA = mxGetIr(prhs[0]);      // row indices for elements of A
    mwIndex* pcolindexA = mxGetJc(prhs[0]); // index into the columns
    mwSize M = mxGetM(prhs[0]);             // number of rows in A
    mwSize N = mxGetN(prhs[0]);             // number of columns in A

    // allocate memory for A^T
    plhs[0] = mxCreateSparse(N, M, pcolindexA[N], mxREAL);
    double* outAt = mxGetPr(plhs[0]);
    mwIndex* outprowAt = mxGetIr(plhs[0]);
    mwIndex* outpcolindexAt = mxGetJc(plhs[0]);

    // temp[j + 1] stores the number of nonzero elements in row j of A
    std::vector<mwSize> temp(M + 1, 0); 
    for(mwIndex i = 0; i != N; ++i) {
        for(mwIndex j = pcolindexA[i]; j < pcolindexA[i + 1]; ++j)
            ++temp[prowA[j] + 1];
    }
    outpcolindexAt[0] = 0;
    for(mwIndex i = 1; i <= M; ++i) {
        outpcolindexAt[i] = outpcolindexAt[i - 1] + temp[i];
        temp[i] = outpcolindexAt[i];
    }
    for(mwIndex i = 0; i != N; ++i) {
        for(mwIndex j = pcolindexA[i]; j < pcolindexA[i + 1]; ++j) {
            outprowAt[temp[prowA[j]]] = i;
            outAt[temp[prowA[j]]++] = A[j];
        }
    }
}

Comparing this algorithm with Matlab's implementation of transpose, we observe similar execution times. Note that this algorithm can be modified in a straightforward way to eliminate the temp array.

Upvotes: 3

Da Kuang
Da Kuang

Reputation: 892

After a week's exploration, my guess about the internal mechanism of transposing a matrix is sorting.

Suppose A is a sparse matrix,

[I, J, S] = find(A);
[sorted_I, idx] = sort(I);
J = J(idx);
S = S(idx);
B = sparse(J, sorted_I, S);

Then B is the transpose of A.

The above implementation has about half the efficiency of Matlab's built-in transpose on my machine. Considering Matlab's built-in functions are multi-threaded, my guess might be a reasonable one.

Upvotes: 10

Mikhail
Mikhail

Reputation: 21749

I agree with what Roger Rowland has mentioned in comments. To ground this suggestion you can check some function from BLAS interface, which MATLAB uses for matrix operations. I'm not sure what implementation does it use, but since they use Intel IPP for image handling, I suppose they might as well use Intel MKL to make matrix operations fast.

And here is the documentation for mkl_?cscsv function, which solves a system of linear equations for a sparse matrix in the CSC format. Note the transa input flag, which explicitly defines whether a matrix provided should be treated as transposed or not.

Upvotes: 1

Related Questions