user3562812
user3562812

Reputation: 1829

nlargest() on indexed series doesn't return the result from each index

I have a dataframe of election candidates, donors' occupation, and donated(received) amount. And, I am trying to find out the top 7 largest amount received by each candidate.

candidate name = cand_nm
donors' occupation = contbr_occupation
received amount = contb_receipt_amt

So I first grouped the dataframe by candidates' name and donor's occupation, and add up donation amount using .sum()

grouped = df.groupby(['cand_nm','contbr_occupation'])['contb_receipt_amt'].sum()

Then, I use nlargest() as below, but it returns top 7 amount from the entire series, not from each group. How can I calculate top 7 donation amount from each group?

grouped.nlargest(7)

Another question is "grouped" variable appears to be an indexed series. But when I print out its index using grouped.index It doesn't return "cand_nm" or "contbr_occupation". Am I wrong to think that this is an indexed series?

enter image description here

Upvotes: 1

Views: 43

Answers (1)

jezrael
jezrael

Reputation: 862891

You can use SeriesGroupBy.nlargest with group_keys=False for avoid duplicated level of MultiIndex:

s1 = grouped.groupby(level=0, group_keys=False).nlargest(7)

Or use Series.sort_values with GroupBy.head:

s1 = grouped.sort_values(ascending=False).groupby(level=0).head(7)

Sample:

df = pd.DataFrame({
        'contbr_occupation':list('abcdef'),
        'cand_nm':list('aaabbb'),
        'contb_receipt_amt':[7,8,9,4,2,3]
})

grouped = df.groupby(['cand_nm','contbr_occupation'])['contb_receipt_amt'].sum()

s1 = grouped.sort_values(ascending=False).groupby(level=0).head(2)
print (s1)
cand_nm  contbr_occupation
a        c                    9
         b                    8
b        d                    4
         f                    3
Name: contb_receipt_amt, dtype: int64

s1 = grouped.groupby(level=0, group_keys=False).nlargest(2)
print (s1)
cand_nm  contbr_occupation
a        c                    9
         b                    8
b        d                    4
         f                    3
Name: contb_receipt_amt, dtype: int64

Last for DataFrame add Series.reset_index:

df1 = s1.reset_index()
print (df1)
  cand_nm contbr_occupation  contb_receipt_amt
0       a                 c                  9
1       a                 b                  8
2       b                 d                  4
3       b                 f                  3

Upvotes: 1

Related Questions