Fetch the Next Record to the percentile value in a Pandas Column

Question

I have a following data Frame

Year    Sector  Number   Count
2015    AA      173      277
2015    AA      172      278
2015    AA      173      234
2015    BB      173      234
2015    BB      171      273
2015    BB      173      272
2015    CC      172      272
2015    CC      172      234
2015    CC      173      234
2015    CC      173      345
2016    AA      173      277
2016    AA      173      277
2016    BB      173      277
2016    BB      173      277
2016    CC      173      277
2016    CC      173      272
2016    CC      170      273
2016    CC      170      275

I need to calculcate the 90th percentile value of 'Count' for each group of ['Year','Sector','Number'] and return the next closest highest record in the group.

For example:

In the group 
2015    CC      172      272
2015    CC      172      234
2015    CC      173      234
2015    CC      173      345

90th percentile value is 323.1 using np.percentile() function. I would want to return value of 345 which is the next highest in the group. Any help here ?

Dimgold · Accepted Answer

You can implement it as a 5 steps process:

Group by
Finding 90% percentile
Finding all the values above
Keep the id of the minimal
Retrieve all necessary ids

assume your dataframe named df:

ids = [data[data.Count>=np.percentile(data.Count,90)].Count.idxmin()
       for group,data in df.groupby('Sector')]

df.loc[ids]

I'll break it down into steps:

1 - iterate over groups by Sector:

for group,data in df.groupby('Sector')

2 - find the percentile:

perc = np.percentile(data.Count,90)

3 - filter the values:

subdf = data[data.Count>=np.percentile(data.Count,90)]

4 - find the id of the minimal value:

subdf.Count.idmin()

5 - return the rows with minimal id:

df.loc[ids]

Fetch the Next Record to the percentile value in a Pandas Column

Answers (1)

Related Questions