Reputation: 399
I have a dataframe with values per day (see df below). I want to group the "Forecast" field per week but with Monday as the first day of the week.
Currently I can do it via pd.TimeGrouper('W') (see df_final below) but it groups the week starting on Sundays (see df_final below)
import pandas as pd
data = [("W1","G1",1234,pd.to_datetime("2015-07-1"),8),
("W1","G1",1234,pd.to_datetime("2015-07-30"),2),
("W1","G1",1234,pd.to_datetime("2015-07-15"),2),
("W1","G1",1234,pd.to_datetime("2015-07-2"),4),
("W1","G2",2345,pd.to_datetime("2015-07-5"),5),
("W1","G2",2345,pd.to_datetime("2015-07-7"),1),
("W1","G2",2345,pd.to_datetime("2015-07-9"),1),
("W1","G2",2345,pd.to_datetime("2015-07-11"),3)]
labels = ["Site","Type","Product","Date","Forecast"]
df = pd.DataFrame(data,columns=labels).set_index(["Site","Type","Product","Date"])
df
Forecast
Site Type Product Date
W1 G1 1234 2015-07-01 8
2015-07-30 2
2015-07-15 2
2015-07-02 4
G2 2345 2015-07-05 5
2015-07-07 1
2015-07-09 1
2015-07-11 3
df_final = (df
.reset_index()
.set_index("Date")
.groupby(["Site","Product",pd.TimeGrouper('W')])["Forecast"].sum()
.astype(int)
.reset_index())
df_final["DayOfWeek"] = df_final["Date"].dt.dayofweek
df_final
Site Product Date Forecast DayOfWeek
0 W1 1234 2015-07-05 12 6
1 W1 1234 2015-07-19 2 6
2 W1 1234 2015-08-02 2 6
3 W1 2345 2015-07-05 5 6
4 W1 2345 2015-07-12 5 6
Upvotes: 8
Views: 10201
Reputation: 6114
I have three solutions to this problem as described below. First, I should state that the ex-accepted answer is incorrect. Here is why:
# let's create an example df of length 9, 2020-03-08 is a Sunday
s = pd.DataFrame({'dt':pd.date_range('2020-03-08', periods=9, freq='D'),
'counts':0})
> s
dt | counts | |
---|---|---|
0 | 2020-03-08 00:00:00 | 0 |
1 | 2020-03-09 00:00:00 | 0 |
2 | 2020-03-10 00:00:00 | 0 |
3 | 2020-03-11 00:00:00 | 0 |
4 | 2020-03-12 00:00:00 | 0 |
5 | 2020-03-13 00:00:00 | 0 |
6 | 2020-03-14 00:00:00 | 0 |
7 | 2020-03-15 00:00:00 | 0 |
8 | 2020-03-16 00:00:00 | 0 |
These nine days span three Monday-to-Sunday weeks. The weeks of March 2nd, 9th, and 16th. Let's try the accepted answer:
# the accepted answer
> s.groupby(pd.Grouper(key='dt',freq='W-Mon')).count()
dt | counts |
---|---|
2020-03-09 00:00:00 | 2 |
2020-03-16 00:00:00 | 7 |
This is wrong because the OP wants to have "Monday as the first day of the week" (not as the last day of the week) in the resulting dataframe. Let's see what we get when we try with freq='W'
> s.groupby(pd.Grouper(key='dt', freq='W')).count()
dt | counts |
---|---|
2020-03-08 00:00:00 | 1 |
2020-03-15 00:00:00 | 7 |
2020-03-22 00:00:00 | 1 |
This grouper actually grouped as we wanted (Monday to Sunday) but labeled the 'dt' with the END of the week, rather than the start. So, to get what we want, we can move the index by 6 days like:
w = s.groupby(pd.Grouper(key='dt', freq='W')).count()
w.index -= pd.Timedelta(days=6)
or alternatively we can do:
s.groupby(pd.Grouper(key='dt',freq='W-Mon',label='left',closed='left')).count()
a third solution, arguably the most readable one, is converting dt
to period first, then grouping, and finally (if needed) converting back to timestamp:
s.groupby(s.dt.dt.to_period('W'))['counts'].count().to_timestamp()
# a variant of this solution is: s.set_index('dt').to_period('W').groupby(pd.Grouper(freq='W')).count().to_timestamp()
all of these solutions return what the OP asked for:
dt | counts |
---|---|
2020-03-02 00:00:00 | 1 |
2020-03-09 00:00:00 | 7 |
2020-03-16 00:00:00 | 1 |
Explanation: when freq
is provided to pd.Grouper
, both closed
and label
kwargs default to right
. Setting freq
to W
(short for W-Sun
) works because we want our week to end on Sunday (Sunday included, and g.closed == 'right'
handles this). Unfortunately, the pd.Grouper
docstring does not show the default values but you can see them like this:
g = pd.Grouper(key='dt', freq='W')
print(g.closed, g.label)
> right right
Upvotes: 6
Reputation: 862511
Use W-MON
instead W
, check anchored offsets:
df_final = (df
.reset_index()
.set_index("Date")
.groupby(["Site","Product",pd.Grouper(freq='W-MON')])["Forecast"].sum()
.astype(int)
.reset_index())
df_final["DayOfWeek"] = df_final["Date"].dt.dayofweek
print (df_final)
Site Product Date Forecast DayOfWeek
0 W1 1234 2015-07-06 12 0
1 W1 1234 2015-07-20 2 0
2 W1 1234 2015-08-03 2 0
3 W1 2345 2015-07-06 5 0
4 W1 2345 2015-07-13 5 0
Upvotes: 9