Reputation: 2110
To better explain by problem better lets pretend i have a shop with 3 unique customers and my dataframe contains every purchase of my customers with weekday, name and paid price.
name price weekday
0 Paul 18.44 0
1 Micky 0.70 0
2 Sarah 0.59 0
3 Sarah 0.27 1
4 Paul 3.45 2
5 Sarah 14.03 2
6 Paul 17.21 3
7 Micky 5.35 3
8 Sarah 0.49 4
9 Micky 17.00 4
10 Paul 2.62 4
11 Micky 17.61 5
12 Micky 10.63 6
The information i would like to get is the average price per unique customer per weekday. What i often do in similar situations is to group by several columns with sum and then take the average of a subset of the columns.
df = df.groupby(['name','weekday']).sum()
price
name weekday
Micky 0 0.70
3 5.35
4 17.00
5 17.61
6 10.63
Paul 0 18.44
2 3.45
3 17.21
4 2.62
Sarah 0 0.59
1 0.27
2 14.03
4 0.49
df = df.groupby(['weekday']).mean()
price
weekday
0 6.576667
1 0.270000
2 8.740000
3 11.280000
4 6.703333
5 17.610000
6 10.630000
Of course this only works if all my unique customers would have at least one purchase per day. Is there an elegant way to get a zero value for all combinations between unique index values that have no sum after the first groupby?
My solutions has been so far to either to reindex on a multi index i created from the unique values of the grouped columns or the combination of unstack-fillna-stack but both solutions do not really satisfy me.
Appreciate your help!
Upvotes: 1
Views: 682
Reputation: 153510
IIUC, let's use unstack
and fillna
then stack
:
df_out = df.groupby(['name','weekday']).sum().unstack().fillna(0).stack()
Output:
price
name weekday
Micky 0 0.70
1 0.00
2 0.00
3 5.35
4 17.00
5 17.61
6 10.63
Paul 0 18.44
1 0.00
2 3.45
3 17.21
4 2.62
5 0.00
6 0.00
Sarah 0 0.59
1 0.27
2 14.03
3 0.00
4 0.49
5 0.00
6 0.00
And,
df_out.groupby('weekday').mean()
Output:
price
weekday
0 6.576667
1 0.090000
2 5.826667
3 7.520000
4 6.703333
5 5.870000
6 3.543333
Upvotes: 2
Reputation: 62017
I think you can use pivot_table
to do all the steps at once. I'm not exactly sure what you want but the default aggregation from pivot_table
is the mean. You can change it to 'sum'.
df1 = df.pivot_table(index='name', columns='weekday', values='price',
fill_value=0, aggfunc='sum')
weekday 0 1 2 3 4 5 6
name
Micky 0.70 0.00 0.00 5.35 17.00 17.61 10.63
Paul 18.44 0.00 3.45 17.21 2.62 0.00 0.00
Sarah 0.59 0.27 14.03 0.00 0.49 0.00 0.00
And then take the mean of each column.
df1.mean()
weekday
0 6.576667
1 0.090000
2 5.826667
3 7.520000
4 6.703333
5 5.870000
6 3.543333
Upvotes: 1