Reputation: 64
i want to add 2 columns of 2 different dataframes based on the condition that name is same:
import pandas as pd
df1 = pd.DataFrame([("Apple",2),("Litchi",4),("Orange",6)], columns=['a','b'])
df2 = pd.DataFrame([("Apple",200),("Orange",400),("Litchi",600)], columns=['a','c'])
now i want to add column b and c if the name is same in a.
I tried this df1['b+c']=df1['b']+df2['c']
but it simply adds column b and c so the result comes as
a b b+c
0 Apple 2 202
1 Litchi 4 404
2 Orange 6 606
but i want to
a b+c
0 Apple 202
1 Litchi 604
2 Orange 406
i guess i have to use isin
but i am not getting how?
Upvotes: 2
Views: 6734
Reputation: 765
duckdb:
df1.sql.set_alias("tb1").join(df2.sql.set_alias("tb2"),condition="tb1.a=tb2.a",how="inner").select("tb1.a,b+c")
┌─────────┬─────────┐
│ a │ (b + c) │
│ varchar │ int64 │
├─────────┼─────────┤
│ Apple │ 202 │
│ Litchi │ 604 │
│ Orange │ 406 │
└─────────┴─────────┘
Upvotes: 0
Reputation: 863531
Columns b
and c
are aligned by index values in sum operation, so is necessary create index by DataFrame.set_index
by column a
:
s1 = df1.set_index('a')['b']
s2 = df2.set_index('a')['c']
df1 = (s1+s2).reset_index(name='b+c')
print (df1)
a b+c
0 Apple 202
1 Litchi 604
2 Orange 406
EDIT: If need original value for not matched values use Series.add
with parameter fill_value=0
df2 = pd.DataFrame([("Apple",200),("Apple",400),("Litchi",600)], columns=['a','c'])
print (df2)
a c
0 Apple 200
1 Apple 400
2 Litchi 600
s1 = df1.set_index('a')['b']
s2 = df2.set_index('a')['c']
df1 = s1.add(s2, fill_value=0).reset_index(name='b+c')
print (df1)
a b+c
0 Apple 202.0
1 Apple 402.0
2 Litchi 604.0
3 Orange 6.0
Upvotes: 3