Numpy argsort vs Scipy.stats rankdata

Question

I've recently used both of these functions, and am looking for input from anyone who can speak to the following:

do argsort and rankdata differ fundamentally in their purpose?
are there performance advantages with one over the other? (specifically: large vs small array performance differences?)
what is the memory overhead associated with importing rankdata?

Thanks in advance.

p.s. I could not create the new tags 'argsort' or 'rankdata'. If anyone with sufficient standing feels they should be added to this question, please do.

ntg · Accepted Answer

Do argsort and rankdata differ fundamentally in their purpose?

In my opinion, they do slightly. The first gives you the positions of the data if the data was sorted, while the second the rank of the data. The difference can become apparent in the case of ties:

import numpy as np
from scipy import stats

a = np.array([ 5, 0.3,  0.4, 1, 1, 1, 3, 42])
almost_ranks = np.empty_like(a)
almost_ranks[np.argsort(a)] = np.arange(len(a))
print(almost_ranks)
print(almost_ranks+1)
print(stats.rankdata(a))

Results to (notice 3. 4. 5 vs. 4. 4. 4 ):

[6. 0. 1. 2. 3. 4. 5. 7.]
[7. 1. 2. 3. 4. 5. 6. 8.]
[7. 1. 2. 4. 4. 4. 6. 8.]

Are there performance advantages with one over the other? (specifically: large vs small array performance differences?)

Both algorithms seem to me to have the same complexity: O(NlgN) I would expect the numpy implementation to be slightly faster as it has a bit of a smaller overhead, plus it's numpy. But you should test this yourself... Checking the code for scipy.rankdata, it seems to -at present, my python...- be calling np.unique among other functions, so i would guess it would take more in practice...

what is the memory overhead associated with importing rankdata?

Well, you import scipy, if you had not done so before, so it is the overhead of scipy...

Numpy argsort vs Scipy.stats rankdata

Answers (1)

Related Questions