Reputation: 41
How does this function work?
import numpy as np
first_names = (5,5,5)
last_names = (3,1,2)
x = np.lexsort((first_names, last_names))
print(x)
It gives output [1 2 0] . I assume that the two lists are sorted by the variable last_names. If so, how can number 2 have index 0. 2 is between 1 and 3, so I don't understand how this sorting works. please explain.
Upvotes: 4
Views: 7611
Reputation: 111
Organizes two lists in pairs, by index -> [0, 0, 1, 1, 2, 2 ...] in ascending order, in this case, notice the output:
# idx: 0 1 2 3 4 5 6
a = np.array ([9, 74, 1, 3, 4, 89, 6])
b = np.array ([4, 6, 9, 2, 1, 8, 7])
Out: [2 3 4 6 0 1 5]
The first number is 2, which is the lowest number of a ([]), and will double with 9, since they are of the same index. To recap, the second smallest number of a ([]) is number 3, which will pair with the 2 of b ([]), since they are of the same index!
Upvotes: 0
Reputation: 539
In Layman terms:
Initially, lets sort first_names and last_names separately.
first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Genesis', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])
first_names.sort()
first_names
>>array(['Betsey', 'Genesis', 'Lanell', 'Margery', 'Shelley'], dtype='<U7')
last_names.sort()
last_names
>>array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'], dtype='<U7')
[first_names[i] + ' ' + last_names[i] for i in range(len(first_names))]
>>array(['Betsey Battle', 'Genesis Brien', 'Lanell Plotner', 'Margery Stahl', 'Shelley Woolum'])
Now lets say we just want to sort names by first_names only
first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Genesis', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])
_ = np.lexsort((last_names, first_names))
[first_names[i] + ' ' + last_names[i] for i in _]
>>['Betsey Battle', 'Genesis Stahl', 'Lanell Plotner', 'Margery Woolum', 'Shelley Brien']
Here, an obvious question arises:
What is the significance of np.lexsort() if it is possible to sort with simply array.sort() method?
If you look carefully at previous 2 outputs, you can find the answer.
For simplicity, Now lets look at another scenario with 2 similar first names.
first_names = np.array(['Betsey', 'Shelley', 'Lanell', 'Betsey', 'Margery'])
last_names = np.array(['Battle', 'Brien', 'Plotner', 'Stahl', 'Woolum'])
Its not possible to sort first names with corresponding last names with simply sort() method, but lexsort() can sort based on input parameters.
_ = np.lexsort((last_names, first_names))
[first_names[i] + ' ' + last_names[i] for i in _]
>>['Betsey Battle', 'Betsey Stahl', 'Lanell Plotner', 'Margery Woolum', 'Shelley Brien']
We can do the same thing with multiple arrays like np.lexsort((last_names, middle_names, first_names))
.
Where arrays will be initially sorted based on first_names
, if there are similar values then by middle_names
, and so on...
Upvotes: 0
Reputation: 51395
Essentially, np.lexsort((first_names, last_names))
says : sort by last_name
first, then sort by first_name
Reading the documentation, and particularly the example located under "Sort two columns of numbers:", reveals a lot. Essentially, you are first sorting by last_name
, which reorders that so that index 1
(whose value is 1) is first, index 2
(whose value is 2) is second, and index 0
(whose value is 3) is third. With this order, the sorted last_name
ends up as (1,2,3)
, i.e. it is sorted. Then, if there were any ties, the corresponding indices in first_name
would be the tie breaker.
For example, consider this case:
first_names = (5,5,4)
last_names = (3,1,1)
There is a tie between index 1
and 2
in last_name
(they both have the value 1), which will be broken by their corresponding indices in first_name
. At indices 1
and 2
of first_name
, index 2
(value 4) is lower than index 1
(value 5), so it will come first. So, the resulting lexsort
will be [2,1,0]
:
np.lexsort((first_names, last_names))
# array([2, 1, 0])
Upvotes: 7
Reputation: 92460
It returns [1, 2, 0]
because the index 1
corresponds to '1' in the last name. 2
corresponds to '2' and 0
corresponds to '3'. Think of the return value as the order of indexes you need to use to sort the array:
last_names[1], last_names[2], last_names[0]
# 1, 2, 3
sorts the array.
Upvotes: 2