quester
quester

Reputation: 564

Simple extending csr matrix by adding a column

I have this code

import numpy as np
from scipy.sparse import csr_matrix
q = csr_matrix([[1.], [0.]])
ones = np.ones((2, 1))

and now how to add ones column to matrix q to have result shape (2, 2)? (matrix q is sparse and I don't want to change type from csr)

Upvotes: 0

Views: 1402

Answers (1)

hpaulj
hpaulj

Reputation: 231335

The code for sparse.hstack is

return bmat([blocks], format=format, dtype=dtype)

for bmat, then blocks is a 1xN array. If they are all csc, it does a fast version of stack:

A = _compressed_sparse_stack(blocks[0,:], 1)

Conversely sparse.vstack with csr matrixes does

A = _compressed_sparse_stack(blocks[:,0], 0)

In effect given how data is stored in a csr matrix it it relatively easy to add rows (or columns for csc) (I can elaborate if that needs explanation).

Otherwise bmat does:

# convert everything to COO format
# calculate total nnz
data = np.empty(nnz, dtype=dtype)
for B in blocks:
    data[nnz:nnz + B.nnz] = B.data
return coo_matrix((data, (row, col)), shape=shape).asformat(format)

In other words it gets the data, row, col values for each block, concatenates them, makes a new coo matrix, and finally converts it to the desire format.

sparse readily converts between formats. Even the display of a matrix can involve a conversion - to coo for the (i,j) d format, to csr for dense/array. sparse.nonzero converts to coo. Most math converts to csr. A csr is transposed by converting it to a csc (without change of attribute arrays). Much of the conversion is done in compiled code so you don't see delays.

Adding columns directly to csr format is a lot of work. All 3 attribute arrays have to be modified row by row. Again I could go into detail if needed.

Upvotes: 1

Related Questions