BenB
BenB

Reputation: 1050

Sorting two lists -- one is a list of lists

I have two numpy arrays. One is N by M the other is N by 1. I want to be able to sort the first list by any one of it's M dimensions, and I want the lists to keep the same order (i.e. if I swap rows 1 and 15 of list1, I want rows 1 and 15 of list2 to swap too.)

For example:

import numpy as np
a = np.array([[1,6],[3,4],[2,5]])
b = np.array([[.5],[.8],[.2]])

Then, I'd like to be able to sort by, say, the first element of each row in a to give:

a = [[1,6],[2,5],[3,4]]
b = [[.5],[.2],[.8]]

or to sort by, say, the second element of each row in a to give:

a = [[3,4],[2,5],[1,6]]
b = [[.8],[.2],[.5]

I see lots of similar problems in which both lists are single dimensional like, e.g, this question. Or questions about sorting lists of lists, e.g., this one. But I can't find what I'm looking for.

Eventually I got this to work:

import numpy as np
a = np.array([[1,6],[3,4],[2,5]])
b = np.array([[.5],[.8],[.2]])
package = zip(a,b)
print package[0][1]
sortedpackage= sorted(package, key=lambda dim: dim[0][1])
d,e = zip(*sortedpackage)
print d
print e

Now this produces d and e as I want:

  d = [[3,4],[2,5],[1,6]]
  e = [[.8],[.2],[.5]

But I don't understand why. The print package[0][1] gives 0.5 -- which is not the element I'm sorting by. Why is this? Is what I'm doing robust?

Upvotes: 2

Views: 191

Answers (3)

Jared
Jared

Reputation: 26397

The reason print package[0][1] returns 0.5 is because it is accessing the numbers in your list of tuples "as a whole" whereas sorted is looking at each individual element of the given iterable.

You zip a and b in package:

[([1, 6], [0.5]),
 ([3, 4], [0.8]),
 ([2, 5], [0.2])]

It is at this point that you print package[0][1]. The first element is obtained with package[0] = ([1, 6], [0.5]). The next index [1] gives you the second element of the first tuple, thus you get 0.5.

Considering sorted, the function is examining the elements of the iterable, individually. It may first look at ([1, 6], [0.5]), then ([3, 4], [0.8]), and so on.

So when you specify a key with a lambda function you are really saying, for this particular element of the iterable, get the value at [0][1]. That is, sort by the second value of of the first element of the given tuple (the second value of a).

Upvotes: 2

thkang
thkang

Reputation: 11543

inside your package:

package[0] is (a[0], b[0]) thus, package[0][1] is b[0].

your package is triple-nested. key=lambda dim : dim[0][1] means you use item[0][1] as a key to sort package. package consists of item, and item is is double-nested.

to see what element you're sorting by, use package[x][0][1] x being index of that item

Upvotes: 1

jfs
jfs

Reputation: 414149

To apply the same sort order to several numpy arrays, you could use np.argsort(). For example, to sort by the second column:

indices = a[:,1].argsort()
print(a[indices])
print(b[indices])

Output:

[[3 4]
 [2 5]
 [1 6]]

[[ 0.8]
 [ 0.2]
 [ 0.5]]

Upvotes: 2

Related Questions