Reputation: 679
I want to add together data from two dataframes in this way:
>>> df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [2, 3, 2],
'col3': ['aaa', 'bbb', 'ccc']})
>>> df1
col1 col2 col3
0 1 2 aaa
1 2 3 bbb
2 3 2 ccc
>>> df2 = pd.DataFrame({'col1': [4, 4, 5], 'col2': [4, 4, 5],
'col3': ['some', 'more', 'third']})
>>> df2
col1 col2 col3
0 4 4 some
1 4 4 more
2 5 5 third
I would like the result to be:
>>> result
col1 col2 col3
0 4 4 some
1 4 4 more
2 9 7 third
3 1 2 aaa
4 2 3 bbb
That is: if there exist a col3 which have the same value, then col1 + col2 for that entry shall be added together. If it doesnt exist, the rows should just to be concatted. The order of the rows doesnt matter, and I don't need to keep df1 and df2, I just care about the result afterwards.
What is the best way to achieve this?
The data I've just loaded from different csv files that look exactly like that, so maybe there is an alternative way to do it as well? The result I just want to save again as a csv file that looks like above.
Upvotes: 1
Views: 56
Reputation: 153460
Let's use pd.concat
and groupby
to sum values.
pd.concat([df1,df2]).groupby('col3').sum().reset_index().reindex_axis(['col1','col2','col3'],axis=1)
Output:
col1 col2 col3
0 1 2 aaa
1 2 3 bbb
2 4 4 more
3 4 4 some
4 9 7 third
Upvotes: 2