CSBossmann
CSBossmann

Reputation: 201

Panda sum columns with different start date

I am trying to sum panda data frames with different start dates.

If there is no value for df2 I want to use 0, but I cant figure out how df1.fillna(0) also didn't work. Any ideas?

What I have

df1

                COL   
2020-06-03      50      
2020-06-04      51      
2020-06-05      52      
2020-06-06      49      
2020-06-07      46      

df2

                COL    
2020-06-03            
2020-06-04            
2020-06-05           
2020-06-06      10      
2020-06-07      10      

What I do

portfolio = df1 + df2

or

portfolio = df1.fillna(0) + df2fillna(0)

What I get

df3

                COL    
2020-06-03      NaN
2020-06-04      NaN      
2020-06-05      NaN      
2020-06-06      59
2020-06-07      56      

What I want

df3

                COL    
2020-06-03      50      
2020-06-04      51      
2020-06-05      52 
2020-06-06      59
2020-06-07      56      

Upvotes: 1

Views: 41

Answers (1)

Ian
Ian

Reputation: 3898

How about this.

# setup

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    "COL": [50, 51, 52, 49, 46]
},
index=['2020-06-03', '2020-06-04', '2020-06-05', '2020-06-06', '2020-06-07'])

df2 = pd.DataFrame({
    "COL": [np.nan, np.nan, np.nan, 10, 10]
},
index=['2020-06-03', '2020-06-04', '2020-06-05', '2020-06-06', '2020-06-07'])

# logic 
pd.concat([df1, df2]).reset_index().groupby('index').sum()

Output:

            COL
index   
2020-06-03  50.0
2020-06-04  51.0
2020-06-05  52.0
2020-06-06  59.0
2020-06-07  56.0

Upvotes: 2

Related Questions