Reputation: 57
I was wondering if there is an eloquent way to sort (calculate percentile) across columns in a Pandas dataframe with the following condition:
Do the percentile calculation within each category. Each column will belong to a category and the percentile calculation to be done within each category (please see the link for a graphical description.)
I learned that I can do the following which will disregard the categories:
TargetRanking = StartingData.rank(axis="columns", pct=True)
But I would need to groupby each row by the category of each column. Please see the graphical description at the following link.
Upvotes: 1
Views: 1015
Reputation: 29720
Assuming you had a dict with the category mappings, you could simply group the columns by that dict and then use rank
as previously.
categories = {'X1': 'A', 'X3': 'A', 'X5': 'A', 'X2': 'B', 'X4': 'B'}
df.set_index('Date').groupby(categories, axis=1).rank(pct=True)
Upvotes: 1