George
George

Reputation: 41

What is the sorting logic behind np.lexsort?

How does this function work?

import numpy as np
first_names = (5,5,5)
last_names = (3,1,2)
x = np.lexsort((first_names, last_names))
print(x)

It gives output [1 2 0] . I assume that the two lists are sorted by the variable last_names. If so, how can number 2 have index 0. 2 is between 1 and 3, so I don't understand how this sorting works. please explain.

Upvotes: 4

Views: 7611

Answers (4)

l12
l12

Reputation: 111

Organizes two lists in pairs, by index -> [0, 0, 1, 1, 2, 2 ...] in ascending order, in this case, notice the output:

# idx:         0   1   2  3  4  5  6
a = np.array ([9, 74, 1, 3, 4, 89, 6])
b = np.array ([4, 6, 9, 2, 1, 8, 7])

Out: [2 3 4 6 0 1 5]

The first number is 2, which is the lowest number of a ([]), and will double with 9, since they are of the same index. To recap, the second smallest number of a ([]) is number 3, which will pair with the 2 of b ([]), since they are of the same index!

Upvotes: 0

palash
palash

Reputation: 539

In Layman terms:
Initially, lets sort first_names and last_names separately.

first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Genesis', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])
first_names.sort()
first_names

>>array(['Betsey', 'Genesis', 'Lanell', 'Margery', 'Shelley'], dtype='<U7')

last_names.sort()
last_names

>>array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'], dtype='<U7')

[first_names[i] + ' ' + last_names[i] for i in range(len(first_names))]

>>array(['Betsey Battle', 'Genesis Brien', 'Lanell Plotner', 'Margery Stahl', 'Shelley Woolum'])

Now lets say we just want to sort names by first_names only

first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Genesis', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])
_ = np.lexsort((last_names, first_names))
[first_names[i] + ' ' + last_names[i] for i in _]

>>['Betsey Battle', 'Genesis Stahl', 'Lanell Plotner', 'Margery Woolum', 'Shelley Brien']


Here, an obvious question arises:
What is the significance of np.lexsort() if it is possible to sort with simply array.sort() method?
If you look carefully at previous 2 outputs, you can find the answer.

For simplicity, Now lets look at another scenario with 2 similar first names.

first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Betsey', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])

Its not possible to sort first names with corresponding last names with simply sort() method, but lexsort() can sort based on input parameters.

_ = np.lexsort((last_names, first_names))
[first_names[i] + ' ' + last_names[i] for i in _]

>>['Betsey Battle', 'Betsey Stahl', 'Lanell Plotner', 'Margery Woolum', 'Shelley Brien']

We can do the same thing with multiple arrays like np.lexsort((last_names, middle_names, first_names)).
Where arrays will be initially sorted based on first_names, if there are similar values then by middle_names, and so on...

Upvotes: 0

sacuL
sacuL

Reputation: 51395

Essentially, np.lexsort((first_names, last_names)) says : sort by last_name first, then sort by first_name

Reading the documentation, and particularly the example located under "Sort two columns of numbers:", reveals a lot. Essentially, you are first sorting by last_name, which reorders that so that index 1 (whose value is 1) is first, index 2 (whose value is 2) is second, and index 0 (whose value is 3) is third. With this order, the sorted last_name ends up as (1,2,3), i.e. it is sorted. Then, if there were any ties, the corresponding indices in first_name would be the tie breaker.

For example, consider this case:

first_names = (5,5,4)
last_names = (3,1,1)

There is a tie between index 1 and 2 in last_name (they both have the value 1), which will be broken by their corresponding indices in first_name. At indices 1 and 2 of first_name, index 2 (value 4) is lower than index 1 (value 5), so it will come first. So, the resulting lexsort will be [2,1,0]:

np.lexsort((first_names, last_names))
# array([2, 1, 0])

Upvotes: 7

Mark
Mark

Reputation: 92460

It returns [1, 2, 0] because the index 1 corresponds to '1' in the last name. 2 corresponds to '2' and 0 corresponds to '3'. Think of the return value as the order of indexes you need to use to sort the array:

last_names[1], last_names[2], last_names[0] 
# 1, 2, 3

sorts the array.

Upvotes: 2

Related Questions