Jack
Jack

Reputation: 1754

Pandas dataframe diff except some rows?

df  
   end_date    dt_eps   
0   20200930    0.9625
1   20200630    0.5200
2   20200331    0.2130
3   20191231    1.2700
4   20190930    -0.1017
5   20190630    -0.1058
6   20190331    0.0021
7   20181231    0.0100

Note: the value of end_date must be the last day of each year quarter and the sequence is sorted by near and the type is string.

Goal

create q_dt_eps column: calculate the diff of dt_eps between the nearest day but it is the same as dt_eps when the quarter is Q1. For example, the q_dt_eps for 20200930 is 0.4425(0.9625-0.5200) while 20200331 is 1.2700.

Try

df['q_dt_eps']=df['dt_eps'].diff(periods=-1)

But it could not return the same value of dt_eps when the quarter is Q1.

Upvotes: 0

Views: 169

Answers (1)

jtorca
jtorca

Reputation: 1541

You can just convert the date to datetime, extract the quarter of the date, and then create your new column using np.where, keeping the original value when quarter is equal to 1, otherwise using the shifted value.

import numpy as np
import pandas as pd

df = pd.DataFrame({'end_date':['20200930', '20200630', '20200331', 
    '20191231', '20190930', '20190630', '20190331', '20181231'], 
    'dt_eps':[0.9625, 0.52, 0.213, 1.27, -.1017, -.1058, .0021, .01]})

df['end_date'] = pd.to_datetime(df['end_date'], format='%Y%m%d')

df['qtr'] = df['end_date'].dt.quarter

df['q_dt_eps'] = np.where(df['qtr']==1, df['dt_eps'], df['dt_eps'].diff(-1))

df
    end_date  dt_eps  qtr  q_dt_eps
0 2020-09-30  0.9625    3    0.4425
1 2020-06-30  0.5200    2    0.3070
2 2020-03-31  0.2130    1    0.2130
3 2019-12-31  1.2700    4    1.3717
4 2019-09-30 -0.1017    3    0.0041
5 2019-06-30 -0.1058    2   -0.1079
6 2019-03-31  0.0021    1    0.0021
7 2018-12-31  0.0100    4       NaN

Upvotes: 1

Related Questions