adi5257
adi5257

Reputation: 83

How to create a weekly time-series from a daily data in pandas

I have below data table which is at daily level (not regular gaps between data points) which I want to convert into weekly level (starting from a given date and rolling data every 7 days) in python. Data table looks like below:

ID Date Value
1  8/9/2018 2857
1  8/15/2018 2194
1  8/23/2018 226
1  8/30/2018 685
-     ----    --
2  8/6/2018 1390
2  8/17/2018 1162
2  8/27/2018 6320
2  8/30/2018 1150

Output required is below:

Data rolling starts from 1st Jul'2018

ID       Period             Value
1  8/1/2018-8/7/2018          0
1  8/8/2018-8/14/2018        2857
1  8/15/2018-8/21/2018       2194
-   ----------------          --
2  8/1/2018-8/7/2018         1390
2  8/8/2018-8/14/2018         0
2  8/15/2018-8/21/2018       1162
-   ----------------          -
 till 31st Jul'2020.

Upvotes: 1

Views: 3213

Answers (3)

SeaBean
SeaBean

Reputation: 23217

Seems that you are grouping Period and Value (sum for same week) under the same ID. Hence, the solution won't work without grouping by ID.

For each month, as seen from your data, the split weeks is not to be started on any Sunday or Monday, but each week starts at 1st, 8th, 15, 22nd, 29th of the month. Hence, we have to special tailor for such week ranges.

We can do it this way:

df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'].dt.day - 1) % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)

Demo

Test Data Construction As your sample data cannot test condition of summing up 2 entries of the same week, I have added new data with ID equals 3, as follows:

data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
 'Date': ['8/9/2018',
  '8/15/2018',
  '8/23/2018',
  '8/30/2018',
  '8/6/2018',
  '8/17/2018',
  '8/27/2018',
  '8/30/2018',
  '8/6/2018',
  '8/7/2018',
  '8/27/2018',
  '8/28/2018'],
 'Value': [2857,
  2194,
  226,
  685,
  1390,
  1162,
  6320,
  1150,
  1000,
  2000,
  6320,
  1150]}

df = pd.DataFrame(data)

print(df)

    ID       Date  Value
0    1   8/9/2018   2857
1    1  8/15/2018   2194
2    1  8/23/2018    226
3    1  8/30/2018    685
4    2   8/6/2018   1390
5    2  8/17/2018   1162
6    2  8/27/2018   6320
7    2  8/30/2018   1150
8    3   8/6/2018   1000
9    3   8/7/2018   2000
10   3  8/27/2018   6320
11   3  8/28/2018   1150

Run New Codes

df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'].dt.day - 1) % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)

Output

   ID                 Period  Value
0   1  08/08/2018-08/14/2018   2857
1   1  08/15/2018-08/21/2018   2194
2   1  08/22/2018-08/28/2018    226
3   1  08/29/2018-09/04/2018    685
4   2  08/01/2018-08/07/2018   1390
5   2  08/15/2018-08/21/2018   1162
6   2  08/22/2018-08/28/2018   6320
7   2  08/29/2018-09/04/2018   1150
8   3  08/01/2018-08/07/2018   3000
9   3  08/22/2018-08/28/2018   7470

Note that the output is a little bit different from your desired output where every week, even without value, should still show 0 value. To support this, the codes will be much more complicated. You can already see the values grouped by ID and week in correct calendar order now (the sorting sequence of date is guaranteed by the interim field week_start which is in YYYY-MM-DD format so that the sorting sequence of chronological order is secured. Sorting following Period order will result in same month of different years sorted together, defeating the chronological order.

Edit

If the weeks are to be relative to 2018-08-01 instead of being the 1st day of each month, we can modify the line setting df['week_start'], as follows:

df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'] - pd.Timestamp('2018-08-01')).dt.days % 7, unit='d')

Test Run

ID equals 3 is now set to be in Sep 2018.

data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
 'Date': ['8/9/2018',
  '8/15/2018',
  '8/23/2018',
  '8/30/2018',
  '8/6/2018',
  '8/17/2018',
  '8/27/2018',
  '8/30/2018',
  '9/6/2018',
  '9/7/2018',
  '9/27/2018',
  '9/28/2018'],
 'Value': [2857,
  2194,
  226,
  685,
  1390,
  1162,
  6320,
  1150,
  1000,
  2000,
  6320,
  1150]}

df = pd.DataFrame(data)

print(df)


    ID       Date  Value
0    1   8/9/2018   2857
1    1  8/15/2018   2194
2    1  8/23/2018    226
3    1  8/30/2018    685
4    2   8/6/2018   1390
5    2  8/17/2018   1162
6    2  8/27/2018   6320
7    2  8/30/2018   1150
8    3   9/6/2018   1000
9    3   9/7/2018   2000
10   3  9/27/2018   6320
11   3  9/28/2018   1150

Run New Codes:

df['Date1'] = pd.to_datetime(df['Date'])
df['week_start'] = df['Date1'] - pd.to_timedelta((df['Date1'] - pd.Timestamp('2018-08-01')).dt.days % 7, unit='d')
df['week_finish'] = df['week_start'] + pd.Timedelta('6D')
df['Period'] = df['week_start'].dt.strftime('%m/%d/%Y').astype(str) + '-' + df['week_finish'].dt.strftime('%m/%d/%Y').astype(str)
df.groupby(['ID', 'week_start', 'Period'])['Value'].sum().reset_index([0,2]).reset_index(drop=True)

Output

   ID                 Period  Value
0   1  08/08/2018-08/14/2018   2857
1   1  08/15/2018-08/21/2018   2194
2   1  08/22/2018-08/28/2018    226
3   1  08/29/2018-09/04/2018    685
4   2  08/01/2018-08/07/2018   1390
5   2  08/15/2018-08/21/2018   1162
6   2  08/22/2018-08/28/2018   6320
7   2  08/29/2018-09/04/2018   1150
8   3  09/05/2018-09/11/2018   3000
9   3  09/26/2018-10/02/2018   7470

Upvotes: 1

Jacek Błocki
Jacek Błocki

Reputation: 563

Use groupby() method. Make sure values are numeric (int, float not object)

        date  value
0 2018-08-09   2857
1 2018-08-15   2194
2 2018-08-23    226
3 2018-08-30    685
4 2018-08-06   1390
5 2018-08-17   1162
6 2018-08-27   6320
7 2018-08-30   1150
>>> df['week']=df['date'].transform(lambda x: x.week)
>>> df.groupby('week').sum()
      value
week       
32     4247
33     3356
34      226
35     8155

Upvotes: 0

Rishin Rahim
Rishin Rahim

Reputation: 655

Try this:

def get_week(x,start_date):
    return ((x.date()-start_date.date()).days)//7

df['formatted_date'] = pd.to_datetime(df['date'])
start_date = pd.to_datetime('Add your start date')

df["week"] = df["formatted_date"].apply(lambda x:get_week(x,start_date))
df["year"] = df.formatted_date.apply(lambda x: x.year)
    
df.groupby(['week','year']).sum() 

Upvotes: 0

Related Questions