user3207120
user3207120

Reputation: 143

Python: sorting an array with NaNs

Note: I'm using Python and numpy arrays.

I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.

I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.

I'd like it to sort like so:

105.  4.
22.   10.
104.  26.
...
...
...
53.   520.
745.  902.
184.  nan
19.   nan

First I tried using fix_invalid which converts the NaNs into 1x10^20:

#data.txt has one of the arrays with 2 columns and a bunch of rows.
Data_0_30 = array(genfromtxt(fname='data.txt'))

g = open("iblah.txt", "a") #saves to file

def Sorted_i_M_W(mass):
    masked = ma.fix_invalid(mass)
    print  >> g, array(sorted(masked, key=itemgetter(1)))

Sorted_i_M_W(Data_0_30)

g.close()

Or I replaced the function with something like this:

def Sorted_i_M_W(mass):
    sortedmass = sorted( mass, key=itemgetter(1))
    print  >> g, array(sortedmass)

For each attempt I got something like:

...
[  4.46800000e+03   1.61472200e+11]
[  3.72700000e+03   1.74166300e+11]
[  4.91800000e+03   1.75502300e+11]
[  6.43500000e+03              nan]
[  3.95520000e+04   8.38907500e+09]
[  3.63750000e+04   1.27625700e+10]
[  2.08810000e+04   1.28578500e+10]
...

Where at the location of the NaN value, the sorting re-starts again.

(For the fix_invalid the NaN in the above excerpt shows a 1.00000000e+20 value). But I'd like the sorting to ignore the NaN value completely.

What's the easiest way to sort this array the way I want?

Upvotes: 14

Views: 24146

Answers (5)

Bi Rico
Bi Rico

Reputation: 25823

If you're using an older version of numpy and don't want to upgrade (or if you want code that supports older versions of numpy) you can do:

import numpy as np

def nan_argsort(a):
    temp = a.copy()
    temp[np.isnan(a)] = np.inf
    return temp.argsort()

sorted = a[nan_argsort(a[:, 1])]

In newer versions of numpy, at least 1.6 I think, numpy's sort/argsort already has this behavior. If you need to use python's sort for some reason, you can make your own compare function as described in the other answers.

Upvotes: 2

Yuri  Kovalev
Yuri Kovalev

Reputation: 659

You can use comparision function

def cmpnan(x, y):
    if isnan(x[1]):
        return 1 # x is "larger"
    elif isnan(y[1]):
        return -1 # x is "smaller"
    else:
        cmp(x[1], y[1]) # compare numbers

sorted(data, cmp=cmpnan)

see http://docs.python.org/2.7/library/functions.html#sorted

Upvotes: 2

Saullo G. P. Castro
Saullo G. P. Castro

Reputation: 58915

You can create a masked array:

a = np.loadtxt('test.txt')

mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)

And then sort a using the masked array:

a[np.argsort(ma[:, 1])]

Upvotes: 5

Alessandro Mariani
Alessandro Mariani

Reputation: 1221

if you really don't want to use numpy array, you could sort the second column, then get the index to call you array.

it can be done in one line only like this:

yourarray[sorted(range(len(yourarray[:,1])), key=lambda k: yourarray[:,1][k])]

Upvotes: 0

alko
alko

Reputation: 48317

Not sure if it can be done with numpy.sort, but you can use numpy.argsort for sure:

>>> arr
array([[ 105.,    4.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan],
       [  22.,   10.],
       [ 104.,   26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105.,    4.],
       [  22.,   10.],
       [ 104.,   26.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan]])

Upvotes: 7

Related Questions