Arthur Langlois
Arthur Langlois

Reputation: 147

Find top n elements in pandas dataframe column by keeping the grouping

I am trying to find the top 5 elements of the column total_petitions, but keeping the ordered grouping I did.

df = df[['fy', 'EmployerState', 'total_petitions']]
table = df.groupby(['fy','EmployerState']).mean()
table.nlargest(5, 'total_petitions')

sample output:

        
fy  EmployerState   total_petitions
2020    WA           7039.333333
2016    MD           2647.400000
2017    MD           2313.142857
...     TX           2305.541667
2020    TX           2081.952381

desired output:


fy  EmployerState total_petitions   
2016    AL  3.875000
        AR  225.333333
        AZ  26.666667
        CA  326.056604
        CO  21.333333
... ... ...
2020    VA  36.714286
        WA  7039.333333
        WI  43.750000
        WV  8986086.08
        WY  1.000000

with the elements of total_petitions being the 5 states with highest means by year

Upvotes: 1

Views: 348

Answers (1)

Kelly Brower
Kelly Brower

Reputation: 165

What you are looking for is a pivot table:

df = df.pivot_table(values='total_petitions', index=['fy','EmployerState'])
df = df.groupby(level='fy')['total_petitions'].nlargest(5).reset_index(level=0, drop=True).reset_index()

Upvotes: 2

Related Questions