9Morgan8
9Morgan8

Reputation: 61

Subtraction of two DataFrames when Date is the same

Task I want to create excess returns by subtracting the risk-free rates in df_B from the total returns in df_A. The subtraction (total return - risk-free rate = excess return) should be done for all total returns according to the respective date's risk-free rate.

My main issue is the dfs do not have the same length, date range and general structure. To get an idea of my problem here is a simplified example of what I got and what I would like to get back.

What I have:

import pandas as pd
data_A = np.array([['A', 2], ['A', 5], ['B', 8], ['B', -7], ['B', 2], ['B', 4], ['C', 3]])
index_A = np.array([['2003-01-31'], ['2004-02-28'], ['2003-01-31'], ['2004-02-28'], ['2008-06-30'], ['2010-03-31'], ['2003-01-31']])

df_A = pd.DataFrame(data=data_A, columns=['company', 'total return'], index=index_A)

data_B = np.array([[0.5], [0.2], [0.3], [0.4], [0.1], [1.2]])
index_B = np.array([['2000-01-31'], ['2002-02-28'], ['2003-01-31'], ['2004-02-28'], ['2010-03-31'], ['2012-04-30']])

df_B = pd.DataFrame(data=data_B, columns=['risk-free rate'], index=index_B)

What I want:

data_C = np.array([['A', 1.7], ['A', 4.6], ['B', 7.7], ['B', -7.4], ['B', 2], ['B', 3.9], ['C', 2.7]])
index_C = np.array([['2003-01-31'], ['2004-02-28'], ['2003-01-31'], ['2004-02-28'], ['2008-06-30'], ['2010-03-31'], ['2003-01-31']])

df_C = pd.DataFrame(data=data_C, columns=['company', 'total return'], index=index_C)

As you see I cannot just subtract df_B from df_A since they have different lengths, the positions of the dates do not coincide due to the company sorting and not every date is covered in both dfs. So what I want is that the code looks through both dfs and subtracts the risk-free rate from all total returns with the same data e.g. all three '2003-01-31' total returns by companies A, B and C in df_A should be subtracted by the risk-free rate but not only from one of those companies and also not from more total returns than that one date. I also want the '2008-06-30' row in df_A to be left alone since there is no coinciding date in df_B. As well no '2000-01-31' or '2002-02-28' risk-free rate should be subtracted since I have no company total returns for those dates.

I tried some merging of the dfs and some for loops but this does not seem to me being the most elegant way of approaching this issue. I would be grateful for your help!

PS: This is my first post so I would also be glad for feedback on how to improve my future posts and for some editing making it more understandable!

Upvotes: 2

Views: 325

Answers (1)

Sachin Kohli
Sachin Kohli

Reputation: 1986

You can merge df_A & df_B first then do subtraction; try this

df_final = df_A.merge(df_B,left_index=True, right_index=True,how="left")
df_final["risk-free rate"].fillna(0,inplace=True)
df_final["excess return"] = df_final["total return"].astype(int) - df_final["risk-free rate"]

Output of df_final; enter image description here

Upvotes: 1

Related Questions