Reputation: 1111
I'd like to rank a numpy array without getting the number positions changed. I was able to do it using the numpy function below but it keeps ranking the 'NaN' values as well, how can I get it to ignore them and just rank the real number values instead. Any help is much appreciated! Thanks!
Here is my code:
import numpy as np
hr=[]
for line in open('file.txt' ,'r'):
hr.append(line.strip().split('\t'))
tf=[]
for i in range(1,len(hr)):
print hr[i][1:13]
tf.append(hr[i][1:13])
for rows in range(0,len(tf)):
array = np.array([tf[rows]],dtype(float))
print array
order = array.argsort()
ranks = order.argsort()
print ranks
Here, each array line is something like this from tf:
array=['NaN', '20', '383.333', 'NaN', 'NaN', 'NaN', '5', '100', '129', '122.5', 'NaN', 'NaN']
Desired output:
ranks=array['NaN', 1, 5, 'NaN', 'NaN', 'NaN', 0, 2, 4, 3, 'NaN', 'NaN']
Actual output with code above:
ranks=array([ 6, 3, 4, 7, 8, 9, 5, 0, 2, 1, 10, 11])
I'm new to python so any help is appreciated!
Upvotes: 2
Views: 3863
Reputation: 879691
If you have scipy, mstats.rankdata basically does what you want:
import scipy.stats.mstats as mstats
import numpy as np
array = np.array(map(float, ['NaN', '20', '383.333', 'NaN', 'NaN', 'NaN', '5', '100', '129', '122.5', 'NaN', 'NaN']))
np.ma.masked_invalid
masks the nan
values. mstats.rankdata
ranks the non-masked values, and assigns 0 to the masked values.
ranks = mstats.rankdata(np.ma.masked_invalid(array))
print(ranks)
# [ 0. 2. 6. 0. 0. 0. 1. 3. 5. 4. 0. 0.]
Now we just spruce it up a bit to get the desired output:
ranks[ranks == 0] = np.nan
ranks -= 1
print(ranks)
# [ nan 1. 5. nan nan nan 0. 2. 4. 3. nan nan]
Upvotes: 4