Reputation: 2520
I want to combine two data frames with different and overlapping columns:
df1
X a Y b c
A P 1 Q 21 1.135899
B P 2 Q 22 1.093204
C P 3 Q 23 2.035373
D P 4 Q 24 0.350060
E P 5 Q 25 -0.939962
df2
a b d
A 1 21 5.5
A 1 21 3.3
A 1 21 2.1
B 2 22 0.8
B 2 22 0.5
C 3 23 1.3
C 3 23 6.5
C 3 23 7.1
I would like to combine both data frames in this way:
df3
a b c d
A 1 21 1.135899 5.5
A 1 21 1.135899 3.3
A 1 21 1.135899 2.1
B 2 22 1.093204 0.8
B 2 22 1.093204 0.5
C 3 23 2.035373 1.3
C 3 23 2.035373 6.5
C 3 23 2.035373 7.1
How can I achieve this?
Upvotes: 1
Views: 338
Reputation: 862511
For append all columns from df1
if not exist in df2
filter by difference
and join
by default left join:
df = df2.join(df1[df1.columns.difference(df2.columns)])
print (df)
a b d X Y c
A 1 21 5.5 P Q 1.135899
A 1 21 3.3 P Q 1.135899
A 1 21 2.1 P Q 1.135899
B 2 22 0.8 P Q 1.093204
B 2 22 0.5 P Q 1.093204
C 3 23 1.3 P Q 2.035373
C 3 23 6.5 P Q 2.035373
C 3 23 7.1 P Q 2.035373
And if need only some columns add subset by list
:
df = df2.join(df1[df1.columns.difference(df2.columns)])[['a','b','c','d']]
print (df)
a b c d
A 1 21 1.135899 5.5
A 1 21 1.135899 3.3
A 1 21 1.135899 2.1
B 2 22 1.093204 0.8
B 2 22 1.093204 0.5
C 3 23 2.035373 1.3
C 3 23 2.035373 6.5
C 3 23 2.035373 7.1
Detail:
print (df1[df1.columns.difference(df2.columns)])
X Y c
A P Q 1.135899
B P Q 1.093204
C P Q 2.035373
D P Q 0.350060
E P Q -0.939962
Upvotes: 0
Reputation: 164623
Try a left merge. To maintain index, you will need use reset_index
before and set_index
after the marge.
res = df2.reset_index()\
.merge(df1, how='left')\
.set_index('index')\
.loc[:, ['a', 'b', 'c', 'd']]
print(res)
# a b c d
# index
# A 1 21 1.135899 5.5
# A 1 21 1.135899 3.3
# A 1 21 1.135899 2.1
# B 2 22 1.093204 0.8
# B 2 22 1.093204 0.5
# C 3 23 2.035373 1.3
# C 3 23 2.035373 6.5
# C 3 23 2.035373 7.1
Upvotes: 2