Reputation: 369
I have following DataFrame:
tips.head()
Output:
total_bill tip smoker day time size tip_pct
0 16.99 1.01 No Sun Dinner 2 0.059447
1 10.34 1.66 No Sun Dinner 3 0.160542
2 21.01 3.50 No Sun Dinner 3 0.166587
3 23.68 3.31 No Sun Dinner 2 0.139780
4 24.59 3.61 No Sun Dinner 4 0.146808
A following function is created to sort the df based on the column "tip_pct" and output the first 3 or 6 rows.
def top(df, n=3, column='tip_pct'):
return df.sort_values(by=column)[-n:]
top(tips, n=6)
Output:
total_bill tip smoker day time size tip_pct
0 16.99 1.01 No Sun Dinner 2 0.059447
1 10.34 1.66 No Sun Dinner 3 0.160542
2 21.01 3.50 No Sun Dinner 3 0.166587
3 23.68 3.31 No Sun Dinner 2 0.139780
4 24.59 3.61 No Sun Dinner 4 0.146808
Next I would like an output the same as above with one difference: groupby "smoker".
tips.groupby('smoker').apply(top)
Output as screenshot:
Output as text file:
total_bill tip smoker day time size tip_pct
smoker
No 51 10.29 2.60 No Sun Dinner 2 0.252672
149 7.51 2.00 No Thur Lunch 2 0.266312
232 11.61 3.39 No Sat Dinner 2 0.291990
Yes 67 3.07 1.00 Yes Sat Dinner 1 0.325733
178 9.60 4.00 Yes Sun Dinner 2 0.416667
172 7.25 5.15 Yes Sun Dinner 2 0.710345
Now I would like to do the same as above but using agg
:
tips.groupby('smoker').agg(top)
Next I get following error message, which I couldn't understand:
ValueError: Shape of passed values is (7, 2), indices imply (6, 2)
I couldn't understand why it doesn't work with agg
.
What did I do wrong? Thank you in advance.
Upvotes: 1
Views: 110
Reputation: 862581
Reason is because GroupBy.agg
return aggregate values and also processing each column separately, so here cannot be used, because processing all columns of groups.
Upvotes: 1