Reputation: 147
I am trying to find the top 5 elements of the column total_petitions
, but keeping the ordered grouping I did.
df = df[['fy', 'EmployerState', 'total_petitions']]
table = df.groupby(['fy','EmployerState']).mean()
table.nlargest(5, 'total_petitions')
sample output:
fy EmployerState total_petitions
2020 WA 7039.333333
2016 MD 2647.400000
2017 MD 2313.142857
... TX 2305.541667
2020 TX 2081.952381
desired output:
fy EmployerState total_petitions
2016 AL 3.875000
AR 225.333333
AZ 26.666667
CA 326.056604
CO 21.333333
... ... ...
2020 VA 36.714286
WA 7039.333333
WI 43.750000
WV 8986086.08
WY 1.000000
with the elements of total_petitions
being the 5 states with highest means by year
Upvotes: 1
Views: 348
Reputation: 165
What you are looking for is a pivot table:
df = df.pivot_table(values='total_petitions', index=['fy','EmployerState'])
df = df.groupby(level='fy')['total_petitions'].nlargest(5).reset_index(level=0, drop=True).reset_index()
Upvotes: 2