Reputation: 83
I have below data table which is at daily level (not regular gaps between data points) which I want to convert into weekly level (starting from a given date and rolling data every 7 days) in python. Data table looks like below:
ID Date Value
1 8/9/2018 2857
1 8/15/2018 2194
1 8/23/2018 226
1 8/30/2018 685
- ---- --
2 8/6/2018 1390
2 8/17/2018 1162
2 8/27/2018 6320
2 8/30/2018 1150
Output required is below:
Data rolling starts from 1st Jul'2018
ID Period Value
1 8/1/2018-8/7/2018 0
1 8/8/2018-8/14/2018 2857
1 8/15/2018-8/21/2018 2194
- ---------------- --
2 8/1/2018-8/7/2018 1390
2 8/8/2018-8/14/2018 0
2 8/15/2018-8/21/2018 1162
- ---------------- -
till 31st Jul'2020.
Upvotes: 1
Views: 3213
Reputation: 23217
Seems that you are grouping Period
and Value
(sum for same week) under the same ID
. Hence, the solution won't work without grouping by ID
.
For each month, as seen from your data, the split weeks is not to be started on any Sunday or Monday, but each week starts at 1st, 8th, 15, 22nd, 29th of the month. Hence, we have to special tailor for such week ranges.
We can do it this way:
df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'].dt.day - 1) % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)
Test Data Construction
As your sample data cannot test condition of summing up 2 entries of the same week, I have added new data with ID
equals 3
, as follows:
data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'Date': ['8/9/2018',
'8/15/2018',
'8/23/2018',
'8/30/2018',
'8/6/2018',
'8/17/2018',
'8/27/2018',
'8/30/2018',
'8/6/2018',
'8/7/2018',
'8/27/2018',
'8/28/2018'],
'Value': [2857,
2194,
226,
685,
1390,
1162,
6320,
1150,
1000,
2000,
6320,
1150]}
df = pd.DataFrame(data)
print(df)
ID Date Value
0 1 8/9/2018 2857
1 1 8/15/2018 2194
2 1 8/23/2018 226
3 1 8/30/2018 685
4 2 8/6/2018 1390
5 2 8/17/2018 1162
6 2 8/27/2018 6320
7 2 8/30/2018 1150
8 3 8/6/2018 1000
9 3 8/7/2018 2000
10 3 8/27/2018 6320
11 3 8/28/2018 1150
Run New Codes
df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'].dt.day - 1) % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)
Output
ID Period Value
0 1 08/08/2018-08/14/2018 2857
1 1 08/15/2018-08/21/2018 2194
2 1 08/22/2018-08/28/2018 226
3 1 08/29/2018-09/04/2018 685
4 2 08/01/2018-08/07/2018 1390
5 2 08/15/2018-08/21/2018 1162
6 2 08/22/2018-08/28/2018 6320
7 2 08/29/2018-09/04/2018 1150
8 3 08/01/2018-08/07/2018 3000
9 3 08/22/2018-08/28/2018 7470
Note that the output is a little bit different from your desired output where every week, even without value, should still show 0 value. To support this, the codes will be much more complicated. You can already see the values grouped by ID and week in correct calendar order now (the sorting sequence of date is guaranteed by the interim field week_start
which is in YYYY-MM-DD format so that the sorting sequence of chronological order is secured. Sorting following Period
order will result in same month of different years sorted together, defeating the chronological order.
If the weeks are to be relative to 2018-08-01
instead of being the 1st day of each month, we can modify the line setting df['week_start']
, as follows:
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'] - pd.Timestamp('2018-08-01')).dt.days % 7, unit='d')
ID
equals 3
is now set to be in Sep 2018.
data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'Date': ['8/9/2018',
'8/15/2018',
'8/23/2018',
'8/30/2018',
'8/6/2018',
'8/17/2018',
'8/27/2018',
'8/30/2018',
'9/6/2018',
'9/7/2018',
'9/27/2018',
'9/28/2018'],
'Value': [2857,
2194,
226,
685,
1390,
1162,
6320,
1150,
1000,
2000,
6320,
1150]}
df = pd.DataFrame(data)
print(df)
ID Date Value
0 1 8/9/2018 2857
1 1 8/15/2018 2194
2 1 8/23/2018 226
3 1 8/30/2018 685
4 2 8/6/2018 1390
5 2 8/17/2018 1162
6 2 8/27/2018 6320
7 2 8/30/2018 1150
8 3 9/6/2018 1000
9 3 9/7/2018 2000
10 3 9/27/2018 6320
11 3 9/28/2018 1150
Run New Codes:
df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'] - pd.Timestamp('2018-08-01')).dt.days % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)
Output
ID Period Value
0 1 08/08/2018-08/14/2018 2857
1 1 08/15/2018-08/21/2018 2194
2 1 08/22/2018-08/28/2018 226
3 1 08/29/2018-09/04/2018 685
4 2 08/01/2018-08/07/2018 1390
5 2 08/15/2018-08/21/2018 1162
6 2 08/22/2018-08/28/2018 6320
7 2 08/29/2018-09/04/2018 1150
8 3 09/05/2018-09/11/2018 3000
9 3 09/26/2018-10/02/2018 7470
Upvotes: 1
Reputation: 563
Use groupby() method. Make sure values are numeric (int, float not object)
date value
0 2018-08-09 2857
1 2018-08-15 2194
2 2018-08-23 226
3 2018-08-30 685
4 2018-08-06 1390
5 2018-08-17 1162
6 2018-08-27 6320
7 2018-08-30 1150
>>> df['week']=df['date'].transform(lambda x: x.week)
>>> df.groupby('week').sum()
value
week
32 4247
33 3356
34 226
35 8155
Upvotes: 0
Reputation: 655
Try this:
def get_week(x,start_date):
return ((x.date()-start_date.date()).days)//7
df['formatted_date'] = pd.to_datetime(df['date'])
start_date = pd.to_datetime('Add your start date')
df["week"] = df["formatted_date"].apply(lambda x:get_week(x,start_date))
df["year"] = df.formatted_date.apply(lambda x: x.year)
df.groupby(['week','year']).sum()
Upvotes: 0