Reputation: 325
I have a pandas dataframe df
with source, destination, and costs to get from source to destination.
SRCLAT SRCLONG DESTLAT DESTLONG PRICE
43.5 47.5 103.5 104 50
43.5 47.5 103.5 104 100
43.5 47.5 103.5 104 100
43.5 30 90 80 300
43.5 30 90 80 400
90 80
I'm trying to do a percentile ranking of prices, where the top percentile is the lowest price, for rows with the same source to destination coordinates, ignoring nans
My desired output:
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
43.5 47.5 103.5 104 50 100% (best price out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 30 90 80 300 100% (best out of 2)
43.5 30 90 80 400 50% (worst out of 2)
90 80
How would I do this?
I've tried to groupby 4 columns with
df.groupby([SRCLAT, SRCLONG, DESTLAT, DESTLONG)].size()
to get the sizes of each unique group but I'm confused on where to go from here
Upvotes: 2
Views: 136
Reputation: 51155
Using rank
with method='max'
c = ['SRCLAT', 'SRCLONG', 'DESTLAT', 'DESTLONG']
d = {'pct': True, 'ascending': False, 'method': 'max'}
df.assign(PERCENTILE=df.groupby(c)['PRICE'].rank(**d))
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
0 43.5 47.5 103.5 104 50 1.000000
1 43.5 47.5 103.5 104 100 0.666667
2 43.5 47.5 103.5 104 100 0.666667
3 43.5 30.0 90.0 80 300 1.000000
4 43.5 30.0 90.0 80 400 0.500000
Upvotes: 1