Gustavo Lopes
Gustavo Lopes

Reputation: 4174

Pandas sum two dataframes based on the value of column

I have two dataframes that I want to sum along the y axis, conditionally.

For example:

df_1

a    b    value
1    1    1011
1    2    1012
2    1    1021
2    2    1022

df_2

a    b    value
9    9    99
1    2    12
2    1    21

I want to make df_1['value'] -= df_2['value'] if df_1[a] == df_2[a] & df_1[b] == df_2[b], so the output would be:

OUTPUT

a    b    value
1    1    1011
1    2    1000
2    1    1000
2    2    1022

Is there a way to achieve that instead of iterating the whole dataframe? (It's pretty big)

Upvotes: 0

Views: 2559

Answers (3)

stahamtan
stahamtan

Reputation: 858

You could also perform a left join and subtract matching values. Here is how to do that:

(pd.merge(df_1, df_2, how='left', on=['a', 'b'], suffixes=('_1', '_2'))
 .fillna(0)
 .assign(value=lambda x: x.value_1 - x.value_2)
)[['a', 'b', 'value']]

Upvotes: 3

fuglede
fuglede

Reputation: 18201

You could let

merged = pd.merge(df_1, df_2, on=['a', 'b'], left_index=True)
df_1.value[merged.index] = merged.value_x - merged.value_y

Result:

In [37]: df_1
Out[37]:
   a  b  value
0  1  1   1011
1  1  2   1000
2  2  1   1000
3  2  2   1022

Upvotes: 1

user3483203
user3483203

Reputation: 51165

Make use of index alignment that pandas provides here, by setting a and b as your index before subtracting.


for df in [df1, df2]:
    df.set_index(['a', 'b'], inplace=True)

df1.sub(df2, fill_value=0).reindex(df1.index)

      value
a b
1 1  1011.0
  2  1000.0
2 1  1000.0
  2  1022.0

Upvotes: 5

Related Questions