Reputation: 69
I have a following data Frame
Year Sector Number Count
2015 AA 173 277
2015 AA 172 278
2015 AA 173 234
2015 BB 173 234
2015 BB 171 273
2015 BB 173 272
2015 CC 172 272
2015 CC 172 234
2015 CC 173 234
2015 CC 173 345
2016 AA 173 277
2016 AA 173 277
2016 BB 173 277
2016 BB 173 277
2016 CC 173 277
2016 CC 173 272
2016 CC 170 273
2016 CC 170 275
I need to calculcate the 90th percentile value of 'Count' for each group of ['Year','Sector','Number'] and return the next closest highest record in the group.
For example:
In the group
2015 CC 172 272
2015 CC 172 234
2015 CC 173 234
2015 CC 173 345
90th percentile value is 323.1 using np.percentile() function. I would want to return value of 345 which is the next highest in the group. Any help here ?
Upvotes: 1
Views: 109
Reputation: 2944
You can implement it as a 5 steps process:
assume your dataframe named df:
ids = [data[data.Count>=np.percentile(data.Count,90)].Count.idxmin()
for group,data in df.groupby('Sector')]
df.loc[ids]
I'll break it down into steps:
1 - iterate over groups by Sector:
for group,data in df.groupby('Sector')
2 - find the percentile:
perc = np.percentile(data.Count,90)
3 - filter the values:
subdf = data[data.Count>=np.percentile(data.Count,90)]
4 - find the id of the minimal value:
subdf.Count.idmin()
5 - return the rows with minimal id:
df.loc[ids]
Upvotes: 1