Boreal Coder
Boreal Coder

Reputation: 107

Numpy argsort vs Scipy.stats rankdata

I've recently used both of these functions, and am looking for input from anyone who can speak to the following:

Thanks in advance.

p.s. I could not create the new tags 'argsort' or 'rankdata'. If anyone with sufficient standing feels they should be added to this question, please do.

Upvotes: 7

Views: 3594

Answers (1)

ntg
ntg

Reputation: 14135

Do argsort and rankdata differ fundamentally in their purpose?

In my opinion, they do slightly. The first gives you the positions of the data if the data was sorted, while the second the rank of the data. The difference can become apparent in the case of ties:

import numpy as np
from scipy import stats

a = np.array([ 5, 0.3,  0.4, 1, 1, 1, 3, 42])
almost_ranks = np.empty_like(a)
almost_ranks[np.argsort(a)] = np.arange(len(a))
print(almost_ranks)
print(almost_ranks+1)
print(stats.rankdata(a))

Results to (notice 3. 4. 5 vs. 4. 4. 4 ):

[6. 0. 1. 2. 3. 4. 5. 7.]
[7. 1. 2. 3. 4. 5. 6. 8.]
[7. 1. 2. 4. 4. 4. 6. 8.]

Are there performance advantages with one over the other? (specifically: large vs small array performance differences?)

Both algorithms seem to me to have the same complexity: O(NlgN) I would expect the numpy implementation to be slightly faster as it has a bit of a smaller overhead, plus it's numpy. But you should test this yourself... Checking the code for scipy.rankdata, it seems to -at present, my python...- be calling np.unique among other functions, so i would guess it would take more in practice...

what is the memory overhead associated with importing rankdata?

Well, you import scipy, if you had not done so before, so it is the overhead of scipy...

Upvotes: 6

Related Questions