gunardilin
gunardilin

Reputation: 369

Use df.agg to run a function on my dataframe

I have following DataFrame:

tips.head()

Output:

total_bill  tip smoker  day time    size    tip_pct
0   16.99   1.01    No  Sun Dinner  2   0.059447
1   10.34   1.66    No  Sun Dinner  3   0.160542
2   21.01   3.50    No  Sun Dinner  3   0.166587
3   23.68   3.31    No  Sun Dinner  2   0.139780
4   24.59   3.61    No  Sun Dinner  4   0.146808

A following function is created to sort the df based on the column "tip_pct" and output the first 3 or 6 rows.

def top(df, n=3, column='tip_pct'):
    return df.sort_values(by=column)[-n:]

top(tips, n=6)

Output:

total_bill  tip smoker  day time    size    tip_pct
0   16.99   1.01    No  Sun Dinner  2   0.059447
1   10.34   1.66    No  Sun Dinner  3   0.160542
2   21.01   3.50    No  Sun Dinner  3   0.166587
3   23.68   3.31    No  Sun Dinner  2   0.139780
4   24.59   3.61    No  Sun Dinner  4   0.146808

Next I would like an output the same as above with one difference: groupby "smoker".

tips.groupby('smoker').apply(top)

Output as screenshot:

enter image description here

Output as text file:

total_bill  tip smoker  day time    size    tip_pct
smoker                              
No  51  10.29   2.60    No  Sun Dinner  2   0.252672
149 7.51    2.00    No  Thur    Lunch   2   0.266312
232 11.61   3.39    No  Sat Dinner  2   0.291990
Yes 67  3.07    1.00    Yes Sat Dinner  1   0.325733
178 9.60    4.00    Yes Sun Dinner  2   0.416667
172 7.25    5.15    Yes Sun Dinner  2   0.710345

Now I would like to do the same as above but using agg:

tips.groupby('smoker').agg(top)

Next I get following error message, which I couldn't understand:

ValueError: Shape of passed values is (7, 2), indices imply (6, 2)

enter image description here

I couldn't understand why it doesn't work with agg. What did I do wrong? Thank you in advance.

Upvotes: 1

Views: 110

Answers (1)

jezrael
jezrael

Reputation: 862581

Reason is because GroupBy.agg return aggregate values and also processing each column separately, so here cannot be used, because processing all columns of groups.

Upvotes: 1

Related Questions