Calculate month over month and year over year change for vintage data

Question

I have a dataframe of economic series whose values can get revised every month, adding a new value for a given date and indexing it by realtime_start (see below dataframe). realtime_start indicates the date at which value for date becomes valid. This value expires as soon as another one takes its place.

date	realtime_start	value
2020-11-01	2020-12-04	142629.0
2020-11-01	2021-01-08	142764.0
2020-11-01	2021-02-05	142809.0
2020-12-01	2021-01-08	142624.0
2020-12-01	2021-02-05	142582.0
2020-12-01	2021-03-05	142503.0
2021-01-01	2021-02-05	142631.0
2021-01-01	2021-03-05	142669.0
2021-01-01	2021-04-02	142736.0
2021-02-01	2021-03-05	143048.0
2021-02-01	2021-04-02	143204.0
2021-03-01	2021-04-02	144120.0

I would like an easy way to calculate the month-over-month change in value based on the last known entry at date.

Calculation method: take the first release from month n (based on realtime_start) and subtract the relevant release from month n-1. Relevant release is the most recent release whose realtime_start date does not exceed that of month n.

See desired output below

date	MoM change
2020-11-01	NaN
2020-12-01	-140
2021-01-01	49
2021-02-01	379
2021-03-01	916

For 2021-03-01, the MoM change value is 144120.0 - 143204.0 = 916.0
For 2021-02-01, the MoM change value is 143048.0 - 142669.0 = 379.0
For 2021-01-01, the MoM change value is 142631.0 - 142582.0 = 49.0

Similarly, I would like to calculate the year-over-year change based on the last known values at date (actual data frame extends further into the past). I would also like to calculate the 3-month (rolling) average of month-over-month change based on last known values at date.

economic series dataframe

Shubham Sharma · Accepted Answer

Solution

df = df.set_index('date')

first = df.groupby(level=0).first()
m = df['realtime_start'].le(first['realtime_start'].shift(-1))
last_val = df['value'].mask(~m).groupby(level=0).last().shift()

mom_change = (first['value'] - last_val).reset_index(name='MoM change')

Explanations

Set the index of the dataframe to the column date then group the dataframe on level=0 and aggregate using first to select the first row for each unique date

>>> first
           realtime_start     value
date                               
2020-11-01     2020-12-04  142629.0
2020-12-01     2021-01-08  142624.0
2021-01-01     2021-02-05  142631.0
2021-02-01     2021-03-05  143048.0
2021-03-01     2021-04-02  144120.0

Shift the column realtime_start in the first dataframe, then compare it with realtime_start column in df to create a boolean mask m

>>> m

date
2020-11-01     True
2020-11-01     True
2020-11-01    False
2020-12-01     True
2020-12-01     True
2020-12-01    False
2021-01-01     True
2021-01-01     True
2021-01-01    False
2021-02-01     True
2021-02-01     True
2021-03-01    False
Name: realtime_start, dtype: bool

Now mask the values in the value column using the above boolean mask then group this masked column on level=0 and aggregate using last to select last row for each unique id

>>> last

date
2020-11-01         NaN
2020-12-01    142764.0
2021-01-01    142582.0
2021-02-01    142669.0
2021-03-01    143204.0
Name: value, dtype: float64

Subtract the value column in first dataframe from the calculated last_val column to calculate the MoM change

>>> mom_change

        date  MoM change
0 2020-11-01         NaN
1 2020-12-01      -140.0
2 2021-01-01        49.0
3 2021-02-01       379.0
4 2021-03-01       916.0

PS: The dataframe must be sorted on date column in order for this solution to work properly

Calculate month over month and year over year change for vintage data

Answers (1)

Solution

Explanations

Related Questions