how can i add together the data of two dataframes

Question

I want to add together data from two dataframes in this way:

    >>> df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [2, 3, 2], 
'col3': ['aaa', 'bbb', 'ccc']})
>>> df1
   col1  col2 col3
0     1     2  aaa
1     2     3  bbb
2     3     2  ccc

    >>> df2 = pd.DataFrame({'col1': [4, 4, 5], 'col2': [4, 4, 5], 
'col3': ['some', 'more', 'third']})

>>> df2
   col1  col2   col3
0     4     4   some
1     4     4   more
2     5     5  third

I would like the result to be:

>>> result
   col1  col2   col3
0     4     4   some
1     4     4   more
2     9     7  third
3     1     2    aaa
4     2     3    bbb

That is: if there exist a col3 which have the same value, then col1 + col2 for that entry shall be added together. If it doesnt exist, the rows should just to be concatted. The order of the rows doesnt matter, and I don't need to keep df1 and df2, I just care about the result afterwards.

What is the best way to achieve this?

The data I've just loaded from different csv files that look exactly like that, so maybe there is an alternative way to do it as well? The result I just want to save again as a csv file that looks like above.

Scott Boston · Accepted Answer

Let's use pd.concat and groupby to sum values.

pd.concat([df1,df2]).groupby('col3').sum().reset_index().reindex_axis(['col1','col2','col3'],axis=1)

Output:

   col1  col2   col3
0     1     2    aaa
1     2     3    bbb
2     4     4   more
3     4     4   some
4     9     7  third

how can i add together the data of two dataframes

Answers (1)

Related Questions