zyg
zyg

Reputation: 29

Why is csr_matrix slice A[index,:]so fast in scipy?

Now,I need to realize sparse matrix slice in C++ using Eigen3.3.8.But I can't realize it as fast as scipy in python.
for example,A is a sparse matrix of shape(1000000,1000000),index is a list I wanted to choose from A.
When I write as this:

A=A[index,:]

It spend 0.5114s.
When I write as this:

def extractor(indices,N)::
    indptr=np.arange(len(indices)+1)
    data=np.ones(len(indices))
    shape=(len(indices),N)
    return sparse.csr_matrix((data,indices,indptr), shape=shape)
A=extractor(index,A.shape[0])*A

it spend 3.381s.(this idea is from Sparse matrix slicing using list of int).
So How can I realize fast csr matrix slice?

Upvotes: 0

Views: 120

Answers (2)

Homer512
Homer512

Reputation: 13295

As I've previously noted in a comment, the two methods you list are not equivalent. A[index] with a list of indices changes the shape of the output. It can also permute or replicate rows. This can be realized reasonably fast like this:

using CsrMatrixD = Eigen::SparseMatrix<double, Eigen::RowMajor>;

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
  using Triplet = Eigen::Triplet<double>;
  using InnerIterator = Eigen::Ref<const CsrMatrixD>::InnerIterator;
  std::vector<Triplet> triplets;
  int outrow = 0;
  for(int row: indices) {
    for(InnerIterator nonzero(in, row); nonzero; ++nonzero)
      triplets.emplace_back(outrow, nonzero.col(), nonzero.value());
    ++outrow;
  }
  CsrMatrixD rtrn(outrow, in.cols());
  rtrn.setFromTriplets(triplets.begin(), triplets.end());
  return rtrn;
}

This runs faster if the indices are sorted but should still work with any other arrangement. The InnerIterator may be new in Eigen-3.4.

The second option, where you multiply with a diagonal matrix, preserves the original shape and order of elements. It can be done like this:

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
    Eigen::VectorXd diag = Eigen::VectorXd::Zero(in.rows());
    for(int i: indices)
      diag[i] = 1.;
    return diag.asDiagonal() * in;
}

This is still reasonably fast. Sadly there doesn't seem to be a version that uses a sparse vector for the diagonal. Using the prune method is faster in my tests:

CsrMatrixD rowsFromCsr(const Eigen::Ref<const CsrMatrixD>& in,
                       const Eigen::Ref<const Eigen::VectorXi>& indices)
{
  std::vector<bool> bitmap(in.rows());
  for(int row: indices)
    bitmap[row] = true;
  CsrMatrixD rtrn = in;
  rtrn.prune([&bitmap](int row, int /*col*/, double /*value*/) noexcept -> bool {
    return bitmap[row];
  });
  rtrn.makeCompressed();
  return rtrn;
}

Upvotes: 1

hpaulj
hpaulj

Reputation: 231335

With a more modest sized matrix, my extractor code is slower, but only by 2x.

In [4]: M = sparse.random(1000,1000,.1,'csr')

In [5]: M
Out[5]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100000 stored elements in Compressed Sparse Row format>

In [6]: idx = np.arange(100)

In [7]: M[idx,:]
Out[7]: 
<100x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 10116 stored elements in Compressed Sparse Row format>

In [8]: timeit M[idx,:]
173 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The extractor:

In [10]: def extractor(indices,N):
    ...:     indptr=np.arange(len(indices)+1)
    ...:     data=np.ones(len(indices))
    ...:     shape=(len(indices),N)
    ...:     return sparse.csr_matrix((data,indices,indptr), shape=shape)

The nnz match:

In [13]: extractor(idx, M.shape[0])*M
Out[13]: 
<100x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 10116 stored elements in Compressed Sparse Row format>

Somewhat slower, but not as much as in your example.

In [14]: timeit extractor(idx, M.shape[0])*M
302 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is larger than I tested 5yrs ago.

Since I have other things to do, I'm not going to try to test other cases - larger M and longer idx.

You are welcome to explore the sparse code to see if they have changed the calculation, may be streamlining it in various ways that my reconstruction doesn't capture.

Upvotes: 1

Related Questions