wanderingstu
wanderingstu

Reputation: 325

How to groupby 4 columns and rank based on another column?

I have a pandas dataframe df with source, destination, and costs to get from source to destination.

SRCLAT SRCLONG DESTLAT DESTLONG PRICE
43.5   47.5    103.5   104      50                
43.5   47.5    103.5   104      100                  
43.5   47.5    103.5   104      100               
43.5   30      90      80       300                 
43.5   30      90      80       400
               90      80           

I'm trying to do a percentile ranking of prices, where the top percentile is the lowest price, for rows with the same source to destination coordinates, ignoring nans

My desired output:

SRCLAT SRCLONG DESTLAT DESTLONG PRICE  PERCENTILE
43.5   47.5    103.5   104      50       100% (best price out of 3)         
43.5   47.5    103.5   104      100      67% (tied for 2nd out of 3)            
43.5   47.5    103.5   104      100      67% (tied for 2nd out of 3)        
43.5   30      90      80       300      100% (best out of 2)          
43.5   30      90      80       400      50% (worst out of 2)
               90      80   

How would I do this?

I've tried to groupby 4 columns with

df.groupby([SRCLAT, SRCLONG, DESTLAT, DESTLONG)].size()

to get the sizes of each unique group but I'm confused on where to go from here

Upvotes: 2

Views: 136

Answers (1)

user3483203
user3483203

Reputation: 51155

Using rank with method='max'

c = ['SRCLAT', 'SRCLONG', 'DESTLAT', 'DESTLONG']
d = {'pct': True, 'ascending': False, 'method': 'max'}

df.assign(PERCENTILE=df.groupby(c)['PRICE'].rank(**d))

   SRCLAT  SRCLONG  DESTLAT  DESTLONG  PRICE  PERCENTILE
0    43.5     47.5    103.5       104     50    1.000000
1    43.5     47.5    103.5       104    100    0.666667
2    43.5     47.5    103.5       104    100    0.666667
3    43.5     30.0     90.0        80    300    1.000000
4    43.5     30.0     90.0        80    400    0.500000

Upvotes: 1

Related Questions