Paul Floyd
Paul Floyd

Reputation: 6916

In-place sorting of csc_matrix columns

I want to be able to sort columns of a scipy sparse matrix. The scipy documentation is fairly terse, and I can't see much concerning modification of the matrix. On SO I found this post, but the answer given returns a list

The code that I want to write is

s = rand(4, 4, density=0.25, format='csc')

_,colSize = s.get_shape()    
for j in range(0,colSize):
   s.setcol(j, sorted(s.getcol(j), key=attrgetter('data'), reverse=True))

Except there is no setcol and sorted doesn't return the same type as getcol.

As an example of what I'd like to get, if I have in input

<class 'scipy.sparse.csc.csc_matrix'>
[[ 0.          0.33201655  0.          0.        ]
 [ 0.          0.          0.          0.        ]
 [ 0.          0.81332962  0.          0.50794041]
 [ 0.          0.41478979  0.          0.        ]]

then the output that I want is

[[ 0.          0.81332962    0.          0.50794041]
 [ 0.          0.414789790.  0.          0.        ]
 [ 0.          0.332016550.  0.          0.        ]
 [ 0.          0.            0.          0.        ]]

(It doesn't have to be a csc matrix, I assumed that this would be better for column manipulations)

Upvotes: 1

Views: 238

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114841

Here's a short function that sorts the columns in descending order in-place:

import numpy as np


def sort_csc_cols(m):
    """
    Sort the columns of m in descending order.

    m must be a csc_matrix whose nonzero values are all positive.
    m is modified in-place.
    """
    seq = np.arange(m.shape[0])
    for k in range(m.indptr.size - 1):
        start, end = m.indptr[k:k + 2]
        m.data[start:end][::-1].sort()
        m.indices[start:end] = seq[:end - start]

For example, s is a csc_matrix:

In [47]: s
Out[47]: 
<8x12 sparse matrix of type '<class 'numpy.int64'>'
    with 19 stored elements in Compressed Sparse Column format>

In [48]: s.A
Out[48]: 
array([[ 0,  2,  0,  0,  7,  0,  0, 48,  0,  0,  0,  0],
       [ 0,  0, 82,  0,  0, 38, 67, 17,  9,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 47,  0],
       [ 0,  0,  0,  0,  0,  0, 99,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0, 83,  0,  0,  0,  9],
       [ 0,  0,  0,  0,  0,  0, 85, 94,  0, 55, 68,  0],
       [ 0,  0,  0,  0,  0,  0, 22,  0,  0,  0, 71,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

In [49]: sort_csc_cols(s)

In [50]: s.A
Out[50]: 
array([[ 0,  2, 82,  0,  7, 38, 99, 94,  9, 55, 71,  9],
       [ 0,  0,  0,  0,  0,  0, 85, 83,  0,  0, 68,  0],
       [ 0,  0,  0,  0,  0,  0, 67, 48,  0,  0, 47,  0],
       [ 0,  0,  0,  0,  0,  0, 22, 17,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])

Upvotes: 2

Related Questions