Reputation: 1430
Let's say I've input dataframe like this:
|group | date |value|
A 2020-06-30 5
A 2020-06-25 3
A 2020-06-15 8
A 2020-06-08 1
B 2020-06-30 0
I want output dataframe like this (column prev_roughly_week_ago
is just a helper column and can be omitted in final df`):
|group| date | value| prev_roughly_week_ago |diff|
A 2020-06-30 5 3.0 2.0
A 2020-06-22 3 NaN NaN
A 2020-06-18 8 1.0 7.0
A 2020-06-08 1 NaN NaN
B 2020-06-30 0 NaN NaN
Basically, within every group (A, B), I want to calculate differences with values that was roughly week ago. By rough I mean 7 days but no more than 10 days (both 7 and 10 should be parameterizable). The problem is that date
is not "regular".
For example prev_roughly_week_ago
:
Difference (diff
) is just value
minus prev_roughly_week_ago
Here is an sample code to generate input/output dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'group': ['A', 'A', 'A', 'A', 'B'], 'date': ['2020-06-30', '2020-06-22', '2020-06-18', '2020-06-08', '2020-06-30'], 'value':[5,3, 8, 1, 0]})
df['prev_roughly_week_ago'] = [3, np.nan, 1, np.nan, np.nan]
df['diff'] = df['value'] - df['prev_roughly_week_ago']
How to implement it with pandas in clean-way?
Upvotes: 0
Views: 114
Reputation: 1481
Given your input dataframe:
df['date'] = pd.to_datetime(df['date']) #convert to datetime
df = df.sort_values(['group', 'date']).set_index('group') #sort by group and date
df['date_shifted'] = df.groupby(df.index)['date'].shift(1) #get date shifted within groups
df['date_diff'] = (df['date'] - df['date_shifted']).dt.days #calculate difference in days between current and previous dates
df['value_shifted'] = df.groupby(df.index)['value'].shift(1) #get value shifted within groups
df['diff'] = df.apply(lambda row: row['value']-row['value_shifted'] if
(row['date_diff'] >=7) & (row['date_diff'] <=10) else np.nan, axis = 1) #calculate value difference if date difference satisfies [7,10] window
Upvotes: 1