Kore-N
Kore-N

Reputation: 145

Simple way to get index of sorted matrix

I would like to sort an entire matrix and get as a result a vector of indices. In R this can be simply achieved through the command

order("matrix")

In Python the analogous command would be:

np.argsort("matrix", axis=None)

The problem is that while R recognizes the index of a vector also for matrices (i.e. matrix[1] gives a number, not a row), in Python we seem to need always both indeces.

So I had to follow the steps in this question. But I need to change the code, since zip() seems to work correctly only together with list() in Python 3+ (or am I wrong?). The new code would be

import numpy as np    
a = np.array([[1,4,3],[2,8,0]])
n = a.shape[1]
d = list(zip( np.argsort(a, axis=None).__divmod__(n)))
d = list(zip(d[0][0], d[1][0]))

But the R code is so simple! Hence I wonder whether it is not possible to find a similarly simple command in Python. Or whether there is some similarly convenient way of treating a matrix also as a vector.

Upvotes: 3

Views: 1689

Answers (2)

hpaulj
hpaulj

Reputation: 231355

Your d is

In [362]: d
Out[362]: [(1, 2), (0, 0), (1, 0), (0, 2), (0, 1), (1, 1)]

argsort with None produces the sorted indices for a flattened version of the array:

In [366]: np.argsort(a, axis=None)
Out[366]: array([5, 0, 3, 2, 1, 4], dtype=int32)
In [368]: a.flat[_]           # apply this index to flattened a
Out[368]: array([0, 1, 2, 3, 4, 8])

np.argsort(a.flat) does the same thing. (or use a.ravel() if that makes more sense to you.)

We can unravel the 1d indices:

In [369]: np.unravel_index(Out[366], a.shape)
Out[369]: 
(array([1, 0, 1, 0, 0, 1], dtype=int32),
 array([2, 0, 0, 2, 1, 1], dtype=int32))

This uses the same sort of divmod calculation that you do. It can be used to index a directly:

In [370]: a[_]
Out[370]: array([0, 1, 2, 3, 4, 8])

transpose can turn that tuple of arrays into a 2d array similar to your d:

In [371]: np.transpose(__)
Out[371]: 
array([[1, 2],
       [0, 0],
       [1, 0],
       [0, 2],
       [0, 1],
       [1, 1]], dtype=int32)

(np.argwhere does the same thing with the results of np.where.)

Or in one line

In [375]: np.transpose(np.unravel_index(np.argsort(a, axis=None), a.shape)).tolist()
Out[375]: [[1, 2], [0, 0], [1, 0], [0, 2], [0, 1], [1, 1]]

Note that is type of list of index pairs cannot be used to index a. Out[369] is the correct form for indexing items in a 2d array. I'm not familiar with R, but it looks like its indexing style may be more like MATLAB.

Upvotes: 2

user6655984
user6655984

Reputation:

in Python we seem to need always both indeces

This is not true. You can access the elements of a 2D array (or an N-D array) with a flat index, like the indexes 5, 0, 3, 2, 1, 4 returned by argsort. This can be done by using ravel, which creates a one-dimensional view into your n-dimensional array.

a = np.array([[1, 4, 3], [2, 8, 0]])
ix = np.argsort(a, axis=None)
print(a.ravel()[ix])

This prints [0, 1, 2, 3, 4, 8], the array elements are returned according to the indexes that argsort produced.

This can also be used for writing into the array: for example,

a.ravel()[ix[0]] = 42

replaces the smallest element of the array with 42.

Note that passing axis=None makes argsort implicitly flattened the array before sorting, an equivalent of ravel method.

Upvotes: 2

Related Questions