Reputation: 147
I have 2 dataframes as below, some of the index values could be common between the two and I would like to add the values across the two if same index is present. The output should have all the index values present (from 1 & 2) and their cumulative values.
Build
2.1.3.13 2
2.1.3.1 1
2.1.3.15 1
2.1.3.20 1
2.1.3.8 1
2.1.3.9 1
Ref_Build
2.1.3.13 2
2.1.3.10 1
2.1.3.14 1
2.1.3.17 1
2.1.3.18 1
2.1.3.22 1
For example in the above case 2.1.3.13 should show 4 and the remaining 11 of them with 1 each.
What's the efficient way to do this? I tried merge etc., but some of those options were giving me 'intersection' and not 'union'.
Upvotes: 1
Views: 67
Reputation: 30060
You can try merge
with outer
option or concat
on columns
out = pd.merge(df1, df2, left_index=True, right_index=True, how='outer').fillna(0)
# or
out = pd.concat([df1, df2], axis=1).fillna(0)
out['sum'] = out['Build'] + out['Ref_Build']
# or with `eval` in one line
out = pd.concat([df1, df2], axis=1).fillna(0).eval('sum = Build + Ref_Build')
print(out)
Build Ref_Build sum
2.1.3.13 2.0 2.0 4.0
2.1.3.1 1.0 0.0 1.0
2.1.3.15 1.0 0.0 1.0
2.1.3.20 1.0 0.0 1.0
2.1.3.8 1.0 0.0 1.0
2.1.3.9 1.0 0.0 1.0
2.1.3.10 0.0 1.0 1.0
2.1.3.14 0.0 1.0 1.0
2.1.3.17 0.0 1.0 1.0
2.1.3.18 0.0 1.0 1.0
2.1.3.22 0.0 1.0 1.0
Upvotes: 3
Reputation: 30930
Use Series.add
and Series.fillna
df1['Build'].add(df2['Ref_Build']).fillna(df1['Build']).fillna(df2['Ref_Build'])
2.1.3.1 1.0
2.1.3.10 1.0
2.1.3.13 4.0
2.1.3.14 1.0
2.1.3.15 1.0
2.1.3.17 1.0
2.1.3.18 1.0
2.1.3.20 1.0
2.1.3.22 1.0
2.1.3.8 1.0
2.1.3.9 1.0
dtype: float64
Or:
pd.concat([df1['Build'], df2['Ref_Build']], axis=1).sum(axis=1)
2.1.3.13 4.0
2.1.3.1 1.0
2.1.3.15 1.0
2.1.3.20 1.0
2.1.3.8 1.0
2.1.3.9 1.0
2.1.3.10 1.0
2.1.3.14 1.0
2.1.3.17 1.0
2.1.3.18 1.0
2.1.3.22 1.0
dtype: float64
Upvotes: 3