Reputation: 3375
I have a dataframe of customer purchases:
customer shop amount local_date
0 John WALLMART 1.50 2019-04-10
1 John WALLMART 40.79 2019-05-10
2 John LIDL 2.64 2019-08-18
3 John WALLMART 29.17 2019-02-18
4 John LIDL 42.69 2019-07-22
5 John WALLMART 1.50 2019-09-16
6 John WALLMART 40.79 2019-09-17
7 Mary WALLMART 2.64 2019-05-08
8 Mary LIDL 29.17 2019-02-07
9 Mary WALLMART 28.23 2019-02-21
10 Mary ALDI 8.84 2019-10-15
11 Mary WALLMART 5.59 2019-03-23
12 Mary LIDL 53.09 2019-01-03
13 Mary LIDL 46.03 2019-02-03
14 Mary WALLMART 84.17 2019-10-18
15 Paul LIDL 4.63 2019-02-21
16 Paul WALLMART 19.82 2019-02-13
17 Paul ALDI 19.02 2019-12-12
18 Paul LIDL 41.88 2019-06-25
19 Paul ALDI 37.79 2019-12-18
I can pivot it and get the sum per customer per shop:
df.pivot_table(values='amount', index=['customer'], columns=['shop'], aggfunc='sum').reset_index().fillna(0)
shop customer ALDI LIDL WALLMART
0 John 0.00 45.33 113.75
1 Mary 8.84 128.29 120.63
2 Paul 56.81 46.51 19.82
How can I get the amount they spend per month in each shop?
I have tried a few things, which I was planning to pivot into my required format
# this makes no sense to me
df.set_index('local_date').groupby([pd.Grouper(freq='M'),'customer','shop'])['amount'].sum()
local_date customer shop
2019-01-31 Mary LIDL 53.09
2019-02-28 John WALLMART 29.17
Mary LIDL 75.20
WALLMART 28.23
Paul LIDL 4.63
WALLMART 19.82
2019-03-31 Mary WALLMART 5.59
2019-04-30 John WALLMART 1.50
2019-05-31 John WALLMART 40.79
Mary WALLMART 2.64
2019-06-30 Paul LIDL 41.88
2019-07-31 John LIDL 42.69
2019-08-31 John LIDL 2.64
2019-09-30 John WALLMART 42.29
2019-10-31 Mary ALDI 8.84
WALLMART 84.17
2019-12-31 Paul ALDI 56.81
I've also created a dataframe by grouping by dt.month
, then pivoting that, but I end up with the same pivot table I started with:
# create dataframe grouped by monthly sum
newd = df.groupby([df.local_date.dt.month,'customer','shop'])['amount'].sum().to_frame()
#pivoting
newd.pivot_table(values='amount', index=['customer'], columns=['shop'], aggfunc='sum').reset_index().fillna(0)
shop customer ALDI LIDL WALLMART
0 John 0.00 45.33 113.75
1 Mary 8.84 128.29 120.63
2 Paul 56.81 46.51 19.82
Upvotes: 1
Views: 36
Reputation: 150765
groupby
the month with to_period
:
df.groupby([df['local_date'].dt.to_period('M'),'customer','shop'])['amount'].sum()
Output:
local_date customer shop
2019-01 Mary LIDL 53.09
2019-02 John WALLMART 29.17
Mary LIDL 75.20
WALLMART 28.23
Paul LIDL 4.63
WALLMART 19.82
2019-03 Mary WALLMART 5.59
2019-04 John WALLMART 1.50
2019-05 John WALLMART 40.79
Mary WALLMART 2.64
2019-06 Paul LIDL 41.88
2019-07 John LIDL 42.69
2019-08 John LIDL 2.64
2019-09 John WALLMART 42.29
2019-10 Mary ALDI 8.84
WALLMART 84.17
2019-12 Paul ALDI 56.81
Name: amount, dtype: float64
If you want the shop
as columns, you can unstack
:
(df.groupby([df['local_date'].dt.to_period('M'),'customer','shop'])['amount']
.sum().unstack('shop', fill_value=0)
)
Output:
shop ALDI LIDL WALLMART
local_date customer
2019-01 Mary 0.00 53.09 0.00
2019-02 John 0.00 0.00 29.17
Mary 0.00 75.20 28.23
Paul 0.00 4.63 19.82
2019-03 Mary 0.00 0.00 5.59
2019-04 John 0.00 0.00 1.50
2019-05 John 0.00 0.00 40.79
Mary 0.00 0.00 2.64
2019-06 Paul 0.00 41.88 0.00
2019-07 John 0.00 42.69 0.00
2019-08 John 0.00 2.64 0.00
2019-09 John 0.00 0.00 42.29
2019-10 Mary 8.84 0.00 84.17
2019-12 Paul 56.81 0.00 0.00
Upvotes: 1