Reputation: 1754
df
end_date dt_eps
0 20200930 0.9625
1 20200630 0.5200
2 20200331 0.2130
3 20191231 1.2700
4 20190930 -0.1017
5 20190630 -0.1058
6 20190331 0.0021
7 20181231 0.0100
Note: the value of end_date
must be the last day of each year quarter and the sequence is sorted by near and the type is string.
Goal
create q_dt_eps
column: calculate the diff of dt_eps
between the nearest day but it is the same as dt_eps
when the quarter is Q1. For example, the q_dt_eps
for 20200930
is 0.4425(0.9625-0.5200) while 20200331
is 1.2700.
Try
df['q_dt_eps']=df['dt_eps'].diff(periods=-1)
But it could not return the same value of dt_eps when the quarter is Q1.
Upvotes: 0
Views: 169
Reputation: 1541
You can just convert the date to datetime
, extract the quarter of the date, and then create your new column using np.where
, keeping the original value when quarter is equal to 1, otherwise using the shifted value.
import numpy as np
import pandas as pd
df = pd.DataFrame({'end_date':['20200930', '20200630', '20200331',
'20191231', '20190930', '20190630', '20190331', '20181231'],
'dt_eps':[0.9625, 0.52, 0.213, 1.27, -.1017, -.1058, .0021, .01]})
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y%m%d')
df['qtr'] = df['end_date'].dt.quarter
df['q_dt_eps'] = np.where(df['qtr']==1, df['dt_eps'], df['dt_eps'].diff(-1))
df
end_date dt_eps qtr q_dt_eps
0 2020-09-30 0.9625 3 0.4425
1 2020-06-30 0.5200 2 0.3070
2 2020-03-31 0.2130 1 0.2130
3 2019-12-31 1.2700 4 1.3717
4 2019-09-30 -0.1017 3 0.0041
5 2019-06-30 -0.1058 2 -0.1079
6 2019-03-31 0.0021 1 0.0021
7 2018-12-31 0.0100 4 NaN
Upvotes: 1