Reputation: 362
I confuse to understand rank of series. I know that rank is calculated from the highest value to lowest value in a series. If two numbers are equal, then pandas calculates the average of the numbers.
In this example, the highest value is 7. why do we get rank 5.5 for number 7 and rank 1.5 for number 4 ?
S1 = pd.Series([7,6,7,5,4,4])
S1.rank()
Output:
0 5.5
1 4.0
2 5.5
3 3.0
4 1.5
5 1.5
dtype: float64
Upvotes: 4
Views: 2321
Reputation: 1111
You were performing default rank if you want max rank the follow as below
S1 = pd.Series([7,6,7,5,4,4])
S1.rank(method='max')
Here is all rank supported by pandas
methods : {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, and default is ‘average’
S1['default_rank'] = S1.rank()
S1['max_rank'] = S1.rank(method='max')
S1['NA_bottom'] = S1.rank(na_option='bottom')
S1['pct_rank'] = S1.rank(pct=True)
print(S1)
Upvotes: 2
Reputation: 150765
As commented by Joachim, the rank
function accepts an argument method
with default 'average'
. That is, the final rank is the average of all the rank of the same values.
Per the document, other options of method
are:
method : {'average', 'min', 'max', 'first', 'dense'}, default 'average' How to rank the group of records that have the same value (i.e. ties):
- average: average rank of the group
- min: lowest rank in the group
- max: highest rank in the group
- first: ranks assigned in order they appear in the array
- dense: like 'min', but rank always increases by 1 between groups numeric_only : bool, optional
For example, let's try: method='dense'
, then S1.rank(method='dense')
gives:
0 4.0
1 3.0
2 4.0
3 2.0
4 1.0
5 1.0
dtype: float64
which is somewhat equivalent to factorize
.
Update: per your question, let's try writing a function that behaves similar to S1.rank()
:
def my_rank(s):
# sort s by values
s_sorted = s.sort_values(kind='mergesort')
# this is the incremental ranks
# equivalent to s.rank(method='first')
ranks = pd.Series(np.arange(len(s_sorted))+1, index=s_sorted.index)
# averaged ranks
avg_ranks = ranks.groupby(s_sorted).transform('mean')
return avg_ranks
Upvotes: 2
Reputation: 490
The Rank is calculated in this way
Elements - 4, 4, 5, 6, 7, 7 Ranks - 1, 2, 3, 4, 5, 6
Since we have '4' repeating twice, the final rank of each occurrence will be the average of 1,2 which is 1.5. In the same way or 7, final rank for each occurrence will be average of 5,6 which is 5.5
Elements - 4, 4, 5, 6, 7, 7 Ranks - 1, 2, 3, 4, 5, 6 Final Rank - 1.5, 1.5, 3, 4, 5.5, 5.5
Upvotes: 5