Reputation: 71
I want to sort a string array using numpy by the length of the elements.
>>> arr = ["year","month","eye","i","stream","key","house"]
>>> x = np.sort(arr, axis=-1, kind='mergesort')
>>> print(x)
['eye' 'house' 'i' 'key' 'month' 'stream' 'year']
But it sorts them in alphanumeric order. How can I sort them using numpy by their length?
Upvotes: 4
Views: 11717
Reputation: 231605
If I expand your list to arr1=arr*1000
, the Python list sort using len
as the key
function is fastest.
In [77]: len(arr1)
Out[77]: 7000
In [78]: timeit sarr=sorted(arr1,key=len)
100 loops, best of 3: 3.03 ms per loop
In [79]: %%timeit
arrA=np.array(arr1)
larr=[len(i) for i in arrA] # list comprehension works same as map
sarr=arrA[np.argsort(larr)]
....:
100 loops, best of 3: 7.77 ms per loop
Converting the list to array takes about 1 ms (that conversion adds significant overhead, especially for small lists). Using an already created array, and np.char.str_len
the time is still slower than Python sort
.
In [83]: timeit sarr=arrA[np.argsort(np.char.str_len(arrA))]
100 loops, best of 3: 6.51 ms per loop
the np.char
functions can be convenient, they still basically iterate over the list, applying the corresponding str
method.
In general argsort
gives you much of the same power as the key
function.
Upvotes: 1
Reputation: 33532
Add a helper array containing the lenghts of the strings, then use numpy's argsort which gives you the indices which would sort according to these lengths. Index the original data with these indices:
import numpy as np
arr = np.array(["year","month","eye","i","stream","key","house"]) # np-array needed for later indexing
arr_ = map(lambda x: len(x), arr) # remark: py3 would work different here
x = arr[np.argsort(arr_)]
print(x)
Upvotes: 3