oli
oli

Reputation: 1

sparse CSR matrix into 1-dim

I have been trying to reshape my matrix:

array([<320000x799928 sparse matrix of type '<class 'numpy.float64'>' with 2929143 stored elements in Compressed Sparse Row format>], dtype=object)

into a 1 dim matrix as I want to feed it into a neural network. None of the classic transformations work. I tried reshaping, flattening, .todense, and .toarray

Any idea what could be going on here?

Upvotes: 0

Views: 786

Answers (1)

hpaulj
hpaulj

Reputation: 231395

Something that displays as:

array([<320000x799928 sparse matrix of type '<class 'numpy.float64'>' with 2929143 stored elements in Compressed Sparse Row format>], dtype=object)

is a single element (shape (1,)) numpy array, object dtype. The element is a sparse matrix, but the array itself is not.

Starting with a small sparse matrix A, I can make an array that displays like yours:

In [101]: arr = np.array([A])

In [102]: arr
Out[102]: 
array([<3x3 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>],
      dtype=object)

In [103]: arr.shape
Out[103]: (1,)

This is a 1d array already - but not numeric.

I can access that element with:

In [104]: arr[0]
Out[104]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

In [105]: print(arr[0])
  (0, 0)    1.0
  (1, 1)    1.0
  (2, 2)    1.0

And apply toarray (or todense) to it:

In [106]: arr[0].toarray()
Out[106]: 
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

todense will make a np.matrix.

Once it's a ndarray it can be flattened

In [107]: arr[0].toarray().ravel()
Out[107]: array([1., 0., 0., 0., 1., 0., 0., 0., 1.])

The sparse matrix itself can be reshaped to a 1 row matrix. But as long as it's sparse it has to remain 2d.

In [109]: arr[0].reshape(1,9)
Out[109]: 
<1x9 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in COOrdinate format>
In [110]: arr[0].reshape(1,9).A
Out[110]: array([[1., 0., 0., 0., 1., 0., 0., 0., 1.]])

np.matrix has a property that returns a raveled 1d array:

In [115]: arr[0].todense().A1
Out[115]: array([1., 0., 0., 0., 1., 0., 0., 0., 1.])

memory

But big caution about using toarray (or todense). With those dimensions the array will be too big for most memory:

In [118]: 320000*799928*8/1e9
Out[118]: 2047.81568

It works as a sparse matrix because only a small fraction of the values are nonzero

In [119]: 2929143/(320000*799928)
Out[119]: 1.1442994713274194e-05

Upvotes: 2

Related Questions