mobelahcen
mobelahcen

Reputation: 424

Sum two dataframes with different indexing

I have two dataframes with different indexing that I want to sum the same column from the two dataframes. Based on a suggestion, I tried the following but removes disregards other columns like cat

df = df.set_index('date')
tmp = tmp.set_index('date')
result = df['Anomaly'].add(tmp['Anomaly'], fill_value=0).reset_index()
df
    date       cat    Anomaly
0 2018-12-06    a      0
1 2019-01-07    b      0
2 2019-02-06    a      1
3 2019-03-06    a      0
4 2019-04-06    b      0

tmp
    date        cat   Anomaly
0 2018-12-06     a      0
1 2019-01-07     b      1
4 2019-04-06     b      0

result
    date           Anomaly
0 2018-12-06         0.0
1 2019-01-07         1.0 
2 2019-02-06         1.0
3 2019-03-06         0.0
4 2019-04-06         0.0

What I want actually is to sum based on index and keep the category column and int dtype of Anomaly:

result
    date          cat    Anomaly
0 2018-12-06       a         0
1 2019-01-07       b         1
2 2019-02-06       a         1
3 2019-03-06       a         0
4 2019-04-06       b         0

Upvotes: 0

Views: 71

Answers (3)

Serge Ballesta
Serge Ballesta

Reputation: 149075

The answer that I had given on your previous question is still valid after minimal adjustments:

tmp1 = tmp.set_index('date')
result = df.set_index('date')
result.loc[tmp1.index, 'Anomaly'] += tmp1['Anomaly]
result.reset_index(inplace=True)

It gives:

         date cat  Anomaly
0  2018-12-06   a        0
1  2019-01-07   b        1
2  2019-02-06   a        1
3  2019-03-06   a        0
4  2019-04-06   b        0

keeping the cat column from df and keeping the type int64 for Anomaly.

Upvotes: 1

cs95
cs95

Reputation: 402814

concat and groupby

pd.concat([df, temp]).groupby(['date','cat'], as_index=False).sum()  
                                                                     
         date cat  Anomaly
0  2018-12-06   a        0
1  2019-01-07   b        1
2  2019-02-06   a        1
3  2019-03-06   a        0
4  2019-04-06   b        0
                    

set_index + add

a = df.set_index(['date', 'cat'])
b = temp.set_index(['date', 'cat'])

a.add(b, fill_value=0).reset_index().astype({'Anomaly': int})
                                                                                                      
         date cat  Anomaly
0  2018-12-06   a        0
1  2019-01-07   b        1
2  2019-02-06   a        1
3  2019-03-06   a        0
4  2019-04-06   b        0

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

Try update:

df.Anomaly.update(df.Anomaly+tmp.Anomaly)

Output:

           cat  Anomaly
date                   
2018-12-06   a        0
2019-01-07   b        1
2019-02-06   a        1
2019-03-06   a        0
2019-04-06   b        0

Upvotes: 1

Related Questions