Reputation: 424
I have two dataframes with different indexing that I want to sum the same column from the two dataframes.
Based on a suggestion, I tried the following but removes disregards other columns like cat
df = df.set_index('date')
tmp = tmp.set_index('date')
result = df['Anomaly'].add(tmp['Anomaly'], fill_value=0).reset_index()
df
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 0
2 2019-02-06 a 1
3 2019-03-06 a 0
4 2019-04-06 b 0
tmp
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 1
4 2019-04-06 b 0
result
date Anomaly
0 2018-12-06 0.0
1 2019-01-07 1.0
2 2019-02-06 1.0
3 2019-03-06 0.0
4 2019-04-06 0.0
What I want actually is to sum based on index and keep the category column and int
dtype of Anomaly
:
result
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 1
2 2019-02-06 a 1
3 2019-03-06 a 0
4 2019-04-06 b 0
Upvotes: 0
Views: 71
Reputation: 149075
The answer that I had given on your previous question is still valid after minimal adjustments:
tmp1 = tmp.set_index('date')
result = df.set_index('date')
result.loc[tmp1.index, 'Anomaly'] += tmp1['Anomaly]
result.reset_index(inplace=True)
It gives:
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 1
2 2019-02-06 a 1
3 2019-03-06 a 0
4 2019-04-06 b 0
keeping the cat
column from df and keeping the type int64 for Anomaly
.
Upvotes: 1
Reputation: 402814
concat
and groupby
pd.concat([df, temp]).groupby(['date','cat'], as_index=False).sum()
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 1
2 2019-02-06 a 1
3 2019-03-06 a 0
4 2019-04-06 b 0
set_index
+ add
a = df.set_index(['date', 'cat'])
b = temp.set_index(['date', 'cat'])
a.add(b, fill_value=0).reset_index().astype({'Anomaly': int})
date cat Anomaly
0 2018-12-06 a 0
1 2019-01-07 b 1
2 2019-02-06 a 1
3 2019-03-06 a 0
4 2019-04-06 b 0
Upvotes: 1
Reputation: 150785
Try update
:
df.Anomaly.update(df.Anomaly+tmp.Anomaly)
Output:
cat Anomaly
date
2018-12-06 a 0
2019-01-07 b 1
2019-02-06 a 1
2019-03-06 a 0
2019-04-06 b 0
Upvotes: 1