Reputation: 7841

Sorting in Sparse Matrix

I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better:

# for `rand` function, you need newer version of scipy.
from scipy.sparse import *
m = rand(6,6, density=0.6)
d = m.getrow(0)
print d

Output1

(0, 5) 0.874881629788 
(0, 4) 0.352559852239 
(0, 2) 0.504791645463 
(0, 1) 0.885898140175

I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this.

new_d = new_m.getrow(0)
print new_d

Output2

(0, 1) 0.885898140175
(0, 5) 0.874881629788  
(0, 2) 0.504791645463
(0, 4) 0.352559852239

So I can obtain which column is bigger etc:

print new_d.indices

Output3

array([1, 5, 2, 4])

Of course every row should be sorted like above independently.

I have one solution for this problem but it is not elegant.

Upvotes: 9

Answers (2)

Alexander Measure

Reputation: 894

If you're willing to ignore the zero-value elements of the matrix, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.

def sort_coo(m):
    tuples = zip(m.row, m.col, m.data)
    return sorted(tuples, key=lambda x: (x[0], x[2]))

For example:

    >>> from numpy.random import rand
    >>> from scipy.sparse import coo_matrix
    >>>
    >>> d = rand(10, 20)
    >>> d[d > .05] = 0
    >>> s = coo_matrix(d)
    >>> sort_coo(s)
    [(0, 2, 0.004775589084940246),
     (3, 12, 0.029941507166614145),
     (5, 19, 0.015030386789436245),
     (7, 0, 0.0075044957259399192),
     (8, 3, 0.047994403933129481),
     (8, 5, 0.049401058471327031),
     (9, 15, 0.040011608000125043),
     (9, 8, 0.048541825332137023)]

Depending on your needs you may want to tweak the sort keys in the lambda or further process the output. If you want everything in a row indexed dictionary you could do:

from collections import defaultdict

sorted_rows = defaultdict(list)

for i in sort_coo(m):
     sorted_rows[i[0]].append((i[1], i[2]))

Upvotes: 8

Baskaya

Reputation: 7841

My bad solution is like this:

from scipy.sparse import coo_matrix
import numpy as np
a = []
for i in xrange(m.shape[0]): # assume m is square matrix.
   d = m.getrow(i)
   n = len(d.indices)
   s = zip([i]*n, d.indices, d.data)
   sorted_s = sorted(s, key=lambda v: v[2], reverse=True)
   a.extend(sorted_s)
a = np.array(a)
new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape)

There can be some simple mistakes above because I have not checked it yet. But the idea is intuitive, I guess. Is there any good solution?

Edit

This new matrix creation may be useless because if you call getrow method then the order is broken again. Only coo_matrix.col keeps the order.

Another Solution

This one is not exact solution but it may be helpful:

def sortSparseMatrix(m, rev=True, only_indices=True):

    """ Sort a sparse matrix and return column index dictionary
    """
    col_dict = dict() 
    for i in xrange(m.shape[0]): # assume m is square matrix.
        d = m.getrow(i)
        s = zip(d.indices, d.data)
        sorted_s = sorted(s, key=lambda v: v[1], reverse=True)
        if only_indices:
            col_dict[i] = [element[0] for element in sorted_s]
        else:
            col_dict[i] = sorted_s
    return col_dict

>>> print sortSparseMatrix(m)
{0: [5, 1, 0],
 1: [1, 3, 5],
 2: [1, 2, 3, 4],
 3: [1, 5, 2, 4],
 4: [0, 3, 5, 1],
 5: [3, 4, 2]}

Upvotes: 2