Reputation: 633
it looks like sorting numpy structured and record arrays by a single column is much slower than doing a sort on a similar standalone array:
In [111]: a = np.random.rand(1e4)
In [112]: b = np.random.rand(1e4)
In [113]: rec = np.rec.fromarrays([a,b])
In [114]: timeit rec.argsort(order='f0')
100 loops, best of 3: 18.8 ms per loop
In [115]: timeit a.argsort()
1000 loops, best of 3: 891 µs per loop
There is a marginal improvement using the structured array, but it's not dramatic:
In [120]: struct = np.empty(len(a),dtype=[('a','f8'),('b','f8')])
In [121]: struct['a'] = a
In [122]: struct['b'] = b
In [124]: timeit struct.argsort(order='a')
100 loops, best of 3: 15.8 ms per loop
This indicates that it's potentially faster to create an index array from argsort and then use that to reorder the individual arrays. This is OK except that I expect to be dealing with very large arrays and would like to avoid copying data as much as possible. Is there a more efficient way of doing this that I'm missing?
Upvotes: 9
Views: 2489
Reputation: 7840
As Jaime have said, you can use argsort
to sort the record array.
inds = np.argsort(rec['f0'])
And use take
to avoid making a copy
np.take(rec, inds, out=rec)
Upvotes: 3
Reputation: 67427
What´s slowing you is the use of order
, not the fact that you have a record array. If you want to sort by a single field, do it like this:
In [12]: %timeit np.argsort(rec['f0'])
1000 loops, best of 3: 829 us per loop
Once order
is used, performance goes south no matter how many fields you want to sort by:
In [16]: %timeit np.argsort(rec, order=['f0'])
10 loops, best of 3: 27.9 ms per loop
In [17]: %timeit np.argsort(rec, order=['f0', 'f1'])
10 loops, best of 3: 28.4 ms per loop
Upvotes: 4