P.Tillmann
P.Tillmann

Reputation: 2110

Fill zero values for combinations of unique multi-index values after groupby

To better explain by problem better lets pretend i have a shop with 3 unique customers and my dataframe contains every purchase of my customers with weekday, name and paid price.

     name  price  weekday
0    Paul  18.44        0
1   Micky   0.70        0
2   Sarah   0.59        0
3   Sarah   0.27        1
4    Paul   3.45        2
5   Sarah  14.03        2
6    Paul  17.21        3
7   Micky   5.35        3
8   Sarah   0.49        4
9   Micky  17.00        4
10   Paul   2.62        4
11  Micky  17.61        5
12  Micky  10.63        6

The information i would like to get is the average price per unique customer per weekday. What i often do in similar situations is to group by several columns with sum and then take the average of a subset of the columns.

df = df.groupby(['name','weekday']).sum()



              price
name  weekday       
Micky 0         0.70
      3         5.35
      4        17.00
      5        17.61
      6        10.63
Paul  0        18.44
      2         3.45
      3        17.21
      4         2.62
Sarah 0         0.59
      1         0.27
      2        14.03
      4         0.49

df = df.groupby(['weekday']).mean()

             price
weekday           
0         6.576667
1         0.270000
2         8.740000
3        11.280000
4         6.703333
5        17.610000
6        10.630000

Of course this only works if all my unique customers would have at least one purchase per day. Is there an elegant way to get a zero value for all combinations between unique index values that have no sum after the first groupby?

My solutions has been so far to either to reindex on a multi index i created from the unique values of the grouped columns or the combination of unstack-fillna-stack but both solutions do not really satisfy me.

Appreciate your help!

Upvotes: 1

Views: 682

Answers (2)

Scott Boston
Scott Boston

Reputation: 153510

IIUC, let's use unstack and fillna then stack:

df_out = df.groupby(['name','weekday']).sum().unstack().fillna(0).stack()

Output:

               price
name  weekday       
Micky 0         0.70
      1         0.00
      2         0.00
      3         5.35
      4        17.00
      5        17.61
      6        10.63
Paul  0        18.44
      1         0.00
      2         3.45
      3        17.21
      4         2.62
      5         0.00
      6         0.00
Sarah 0         0.59
      1         0.27
      2        14.03
      3         0.00
      4         0.49
      5         0.00
      6         0.00

And,

df_out.groupby('weekday').mean()

Output:

            price
weekday          
0        6.576667
1        0.090000
2        5.826667
3        7.520000
4        6.703333
5        5.870000
6        3.543333

Upvotes: 2

Ted Petrou
Ted Petrou

Reputation: 62017

I think you can use pivot_table to do all the steps at once. I'm not exactly sure what you want but the default aggregation from pivot_table is the mean. You can change it to 'sum'.

df1 = df.pivot_table(index='name', columns='weekday', values='price', 
                     fill_value=0, aggfunc='sum')

weekday      0     1      2      3      4      5      6
name                                                   
Micky     0.70  0.00   0.00   5.35  17.00  17.61  10.63
Paul     18.44  0.00   3.45  17.21   2.62   0.00   0.00
Sarah     0.59  0.27  14.03   0.00   0.49   0.00   0.00

And then take the mean of each column.

df1.mean()

weekday
0    6.576667
1    0.090000
2    5.826667
3    7.520000
4    6.703333
5    5.870000
6    3.543333

Upvotes: 1

Related Questions