Markus
Markus

Reputation: 2455

Cope with different slicing-behaviour in scipy.sparse and numpy

Setup

I'm aware of the fact that sparse matrices in scipy's .sparse-module differ from numpy-arrays. Also, I'm aware of questions like here regarding slicing of sparse arrays. Anyhow, this and most other questions deal with the performance of slicing.

My question rather deals with how to cope with their different slicing-behaviour. Lets create an example:

import numpy as np
from scipy import sparse

matrix = np.asarray([[0,0,0,1], [1,1,0,0], [1,0,1,0], [1,0,0,1], [1,0,0,1], [1,0,0,1]])
sparse_matrix = sparse.lil_matrix(matrix) # Or another format like .csr_matrix etc.

Given this setup, applying the same slice results in a different output:

matrix[:, 3]
# Output: 
# array([ True, False, False,  True,  True,  True], dtype=bool)

sparse_matrix[:, 3]
# Output:
# matrix([[ True],
#        [False],
#        [False],
#        [ True],
#        [ True],
#        [ True]], dtype=bool)

Question

This is a bit of a bummer, since I need the first output to apply in the second case as well. As said in the beginning, I know that using sparse_matrix.A etc. will give me the desired result. Anyhow, converting the sparse matrix to an array contradicts with the initial use-case of sparse-matrices.

So is there some possibility to achieve the same slice-result without converting sparse-matrix to an array?

Edit: For clarification, since my question might be confusing regarding this: The slice on the sparse_matrix shall have the same output as matrix, meaning that something like sparse_matrix[:, 3] shall output ([ True, False, False, True, True, True]).

Upvotes: 0

Views: 251

Answers (1)

hpaulj
hpaulj

Reputation: 231385

In [150]: arr = np.asarray([[0,0,0,1], [1,1,0,0], [1,0,1,0], [1,0,0,1], [1,0,0,1], [1,0,0,1]]) 
     ...: M = sparse.lil_matrix(arr) # Or another format like .csr_matrix etc. 

A scalar index on a ndarray reduces the dimensions by one:

In [151]: arr[:,3]                                                                                           
Out[151]: array([1, 0, 0, 1, 1, 1])

It does not change the number of dimensions of the sparse matrix.

In [152]: M[:,3]                                                                                             
Out[152]: 
<6x1 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>

This behavior is similar to that of np.matrix subclass (and MATLAB). A sparse matrix is always 2d.

The dense array display of this matrix:

In [153]: M[:,3].A                                                                                           
Out[153]: 
array([[1],
       [0],
       [0],
       [1],
       [1],
       [1]], dtype=int64)

and the np.matrix display:

In [154]: M[:,3].todense()                                                                                   
Out[154]: 
matrix([[1],
        [0],
        [0],
        [1],
        [1],
        [1]], dtype=int64)

np.matrix has a A1 property which produces a 1d array (it converts to ndarray and applies ravel):

In [155]: M[:,3].todense().A1                                                                                
Out[155]: array([1, 0, 0, 1, 1, 1], dtype=int64)

ravel, squeeze and scalar indexing are all ways of reducing the dimensions of a ndarray. But they don't work directly on a np.matrix or sparse matrix.

Another example of a 2d sparse matrix:

In [156]: sparse.lil_matrix(arr[:,3])                                                                        
Out[156]: 
<1x6 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in LInked List format>
In [157]: _.A                                                                                                
Out[157]: array([[1, 0, 0, 1, 1, 1]], dtype=int64)

Note the [[...]]. sparse has added a leading size 1 dimension to the 1d ndarray.

Upvotes: 1

Related Questions