Quant Christo
Quant Christo

Reputation: 1430

Calculating a value difference roughly week ago per group

Let's say I've input dataframe like this:

|group |    date   |value|
    A   2020-06-30  5
    A   2020-06-25  3
    A   2020-06-15  8
    A   2020-06-08  1
    B   2020-06-30  0

I want output dataframe like this (column prev_roughly_week_ago is just a helper column and can be omitted in final df`):

|group|    date   | value|  prev_roughly_week_ago   |diff|
    A   2020-06-30  5        3.0                      2.0
    A   2020-06-22  3        NaN                      NaN
    A   2020-06-18  8        1.0                      7.0
    A   2020-06-08  1        NaN                      NaN
    B   2020-06-30  0        NaN                      NaN

Basically, within every group (A, B), I want to calculate differences with values that was roughly week ago. By rough I mean 7 days but no more than 10 days (both 7 and 10 should be parameterizable). The problem is that date is not "regular".

For example prev_roughly_week_ago:

Difference (diff) is just value minus prev_roughly_week_ago

Here is an sample code to generate input/output dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'group': ['A', 'A', 'A', 'A', 'B'], 'date': ['2020-06-30', '2020-06-22', '2020-06-18', '2020-06-08', '2020-06-30'], 'value':[5,3, 8, 1, 0]})

df['prev_roughly_week_ago'] = [3, np.nan, 1, np.nan, np.nan]
df['diff'] = df['value'] - df['prev_roughly_week_ago']

How to implement it with pandas in clean-way?

Upvotes: 0

Views: 114

Answers (1)

Always Right Never Left
Always Right Never Left

Reputation: 1481

Given your input dataframe:

df['date'] = pd.to_datetime(df['date']) #convert to datetime
df = df.sort_values(['group', 'date']).set_index('group') #sort by group and date
df['date_shifted'] = df.groupby(df.index)['date'].shift(1) #get date shifted within groups
df['date_diff'] = (df['date'] - df['date_shifted']).dt.days #calculate difference in days between current and previous dates
df['value_shifted'] = df.groupby(df.index)['value'].shift(1) #get value shifted within groups
df['diff'] = df.apply(lambda row: row['value']-row['value_shifted'] if 
(row['date_diff'] >=7) & (row['date_diff'] <=10) else np.nan, axis = 1) #calculate value difference if date difference satisfies [7,10] window

Upvotes: 1

Related Questions