ceeka9388
ceeka9388

Reputation: 69

Fetch the Next Record to the percentile value in a Pandas Column

I have a following data Frame

Year    Sector  Number   Count
2015    AA      173      277
2015    AA      172      278
2015    AA      173      234
2015    BB      173      234
2015    BB      171      273
2015    BB      173      272
2015    CC      172      272
2015    CC      172      234
2015    CC      173      234
2015    CC      173      345
2016    AA      173      277
2016    AA      173      277
2016    BB      173      277
2016    BB      173      277
2016    CC      173      277
2016    CC      173      272
2016    CC      170      273
2016    CC      170      275

I need to calculcate the 90th percentile value of 'Count' for each group of ['Year','Sector','Number'] and return the next closest highest record in the group.

For example:

In the group 
2015    CC      172      272
2015    CC      172      234
2015    CC      173      234
2015    CC      173      345

90th percentile value is 323.1 using np.percentile() function. I would want to return value of 345 which is the next highest in the group. Any help here ?

Upvotes: 1

Views: 109

Answers (1)

Dimgold
Dimgold

Reputation: 2944

You can implement it as a 5 steps process:

  • Group by
  • Finding 90% percentile
  • Finding all the values above
  • Keep the id of the minimal
  • Retrieve all necessary ids

assume your dataframe named df:

ids = [data[data.Count>=np.percentile(data.Count,90)].Count.idxmin()
       for group,data in df.groupby('Sector')]

df.loc[ids]

I'll break it down into steps:

1 - iterate over groups by Sector:

for group,data in df.groupby('Sector')

2 - find the percentile:

perc = np.percentile(data.Count,90)

3 - filter the values:

subdf = data[data.Count>=np.percentile(data.Count,90)]

4 - find the id of the minimal value:

subdf.Count.idmin()

5 - return the rows with minimal id:

df.loc[ids]

Upvotes: 1

Related Questions