minks
minks

Reputation: 3039

How can you get the order back after using argsort?

So I have an array for example [-0.7, -3.7, -2.1, -5.8, -1.2 ] and these particular numbers correspond to labels which are in order: say 0.7 corresponds to label 201, 3.7 to label 202 and so on.

On sorting them normally, I receive [-5.8, -3.7, -2.1, -1.2, -0.7]. I am interested in picking out the top 3 values out of these but on sorting, I would lose track of the labels. Now to sort them in order I use np.argsort. This gives me [1,2,0]. This tells me the value with 4 has a low probability while the one with 0 has a high probability.

My question is with argsort, how can I get my mappings back? How can I tell where my labels are now? Is there a way I can keep a track of them while using argsort?

Upvotes: 1

Views: 2023

Answers (3)

Garrett R
Garrett R

Reputation: 2662

This makes a copy and uses the built-in sorted method, but I think it achieves what you want.

vals = [-0.7, -3.7, -2.1, -5.8, -1.2 ]
label_inds_vals = sorted([thing for thing in enumerate(vals)], key=lambda x: x[1])

The sorted values also come with indices that you can use to index their corresponding label in the label array.

If list of lists:

value_lists = [[-0.7, -3.2, -2.1, -5.8, -1.2], [-1.2, -3.2, -3.4, -5.4, -6.4]]

for vals in value_lists:
    #reverse depending if you want top 3 or bottom
    label_inds_vals = sorted([thing for thing in enumerate(vals)], key=lambda x: x[1], reverse = True)         
    print label_inds_vals[:3]

Upvotes: 2

hpaulj
hpaulj

Reputation: 231540

It's a little unclear what you mean by 'where my labels are now`.

But maybe this use of argsort will help

In [163]: values=np.array([-0.7, -3.7, -2.1, -5.8, -1.2 ])

make an array of the labels as well:

In [164]: labels=np.array([200,201,202,203,204])

argsort gives an array of indices, which can be used to reorder both values and labels. Note that this application does not change the original arrays.

In [165]: ind=np.argsort(values)
In [166]: ind
Out[166]: array([3, 1, 2, 4, 0], dtype=int32)
In [167]: values[ind]
Out[167]: array([-5.8, -3.7, -2.1, -1.2, -0.7])
In [168]: labels[ind]
Out[168]: array([203, 201, 202, 204, 200])

If I apply argsort to ind I get another set of indices that lets me resort values back to the original order.

In [169]: ind1=np.argsort(ind)
In [170]: ind1
Out[170]: array([4, 1, 2, 0, 3], dtype=int32)
In [171]: labels[ind][ind1]
Out[171]: array([200, 201, 202, 203, 204])
In [172]: 

I imagine you are already using an expression like this to get the top 3 values

In [180]: ind[:3]
Out[180]: array([3, 1, 2], dtype=int32)  # location of the top 3
In [181]: values[ind[:3]]
Out[181]: array([-5.8, -3.7, -2.1])   # the top 3
In [182]: labels[ind[:3]]
Out[182]: array([203, 201, 202])   # and their labels

Upvotes: 4

Marcus Müller
Marcus Müller

Reputation: 36412

The typical pattern here is decorate - sort - undecorate.

Basically, you want to sort labels by their value, and not values as such; so make yourself a set of value-label tuples, and sort these:

tuples = zip(value,labels) ## doesn't copy the elements of these two sequences, but generates a new set of references to these
sorted_tuples = sorted(tuples, key = lambda tup: tup[0])

Now, 6 Million entries is not little, but it's also not that much for a modern PC. Maybe you should still consider employing something that treats your data more like a raw data table than the extremely flexible, and hence references-containing (these references might be larger than your actual values or labels) python list.

import numpy
table = numpy.arr(vals,labels)

Numpy gives you a great deal of methods to work with bigger tables of data.

Upvotes: 1

Related Questions