Reputation: 778
I have two arrays A and B. In NumPy you can use A as an index to B e.g.
A = np.array([[1,2,3,1,7,3,1,2,3],[4,5,6,4,5,6,4,5,6],[7,8,9,7,8,9,7,8,9]])
B= np.array([1,2,3,4,5,6,7,8,9,0])
c = B[A]
Which produces:
[[2 3 4 2 8 4 2 3 4] [5 6 7 5 6 7 5 6 7] [8 9 0 8 9 0 8 9 0]]
However, in my case the arrays A and B are SciPy CSR sparse arrays and they don't seem to support indexing.
A_sparse = sparse.csr_matrix(A)
B_sparse = sparse.csr_matrix(B)
c = B_sparse[A_sparse]
This results in:
IndexError: Indexing with sparse matrices is not supported except boolean indexing where matrix and index are equal shapes.
I've come up with the function below to replicate NumPy's behavior with the sparse arrays:
def index_sparse(A,B):
A_sparse = scipy.sparse.coo_matrix(A)
B_sparse = sparse.csr_matrix(B)
res = sparse.csr_matrix(A_sparse)
for i,j,v in zip(A_sparse.row, A_sparse.col, A_sparse.data):
res[i,j] = B_sparse[0, v]
return res
res = index_sparse(A, B)
print res.todense()
Looping over the array and having to create a new array in Python isn't ideal. Is there a better way of doing this using built-in functions from SciPy/ NumPy?
Upvotes: 1
Views: 2126
Reputation: 231385
Sparse indexing is less developed. coo
format for example doesn't implement it at all.
I haven't tried to implement this problem, though I have answered others that involve working with the sparse format attributes. So I'll just make some general observations.
B_sparse
is a matrix, so its shape is (1,10)
. So the equivalent to B[A]
is
In [294]: B_sparse[0,A]
Out[294]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 24 stored elements in Compressed Sparse Row format>
In [295]: _.A
Out[295]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
B_sparse[A,:]
or B_sparse[:,A]
gives a 3d warning, since it would be trying to create a matrix version of:
In [298]: B[None,:][:,A]
Out[298]:
array([[[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]]])
As to your function:
A_sparse.nonzero()
does A_sparse.tocoo()
and returns its row
and col
. Effectively the same as what you do.
Here's something that should be faster, though I haven't tested it enough to be sure it is robust:
In [342]: Ac=A_sparse.tocoo()
In [343]: res=Ac.copy()
In [344]: res.data[:]=B_sparse[0, Ac.data].A[0]
In [345]: res
Out[345]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 27 stored elements in COOrdinate format>
In [346]: res.A
Out[346]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
In this example there are 2 zeros that could cleaned up as well (look at res.nonzero()
).
Since you are setting each res[i,j]
with values from Ac.row
and Ac.col
, res
has the same row,col
values as Ac
, so I initialize it as a copy. Then it's just a matter of updating the res.data
attribute. It would be faster to index Bc.data
directly, but that doesn't account for its sparsity.
Upvotes: 1