Reputation: 21270
How I can merge following two data frames on columns A
and B
:
df1
A B C
1 2 3
2 8 2
4 7 9
df2
A B C
5 6 7
2 8 9
And with result to get only results of those two matching rows.
df3
A B C
2 8 2
2 8 9
Upvotes: 1
Views: 92
Reputation: 38500
Here is a less efficient method that should preserve duplicates, but involves two merge/joins
# create a merged DataFrame with variables C_x and C_y with the C values
temp = pd.merge(df1, df2, how='inner', on=['A', 'B'])
# join columns A and B to a stacked DataFrame with the Cs on index
temp[['A', 'B']].join(
pd.DataFrame({'C':temp[['C_x', 'C_y']].stack()
.reset_index(level=1, drop=True)})).reset_index(drop=True)
This returns
A B C
0 2 8 2
1 2 8 9
Upvotes: 0
Reputation: 862671
Solution with Index.intersection
, then select values in both DataFrames
by loc
and last concat
them together:
df1.set_index(['A','B'], inplace=True)
df2.set_index(['A','B'], inplace=True)
idx = df1.index.intersection(df2.index)
print (idx)
MultiIndex(levels=[[2], [8]],
labels=[[0], [0]],
names=['A', 'B'],
sortorder=0)
df = pd.concat([df1.loc[idx],df2.loc[idx]]).reset_index()
print (df)
A B C
0 2 8 2
1 2 8 9
Upvotes: 1
Reputation:
You can concatenate them and drop the ones that are not duplicated:
conc = pd.concat([df1, df2])
conc[conc.duplicated(subset=['A', 'B'], keep=False)]
Out:
A B C
1 2 8 2
1 2 8 9
If you have duplicates,
df1
Out:
A B C
0 1 2 3
1 2 8 2
2 4 7 9
3 4 7 9
4 2 8 5
df2
Out:
A B C
0 5 6 7
1 2 8 9
3 5 6 4
4 2 8 10
You can keep track of the duplicated ones via boolean arrays:
cols = ['A', 'B']
bool1 = df1[cols].isin(df2[cols].to_dict('l')).all(axis=1)
bool2 = df2[cols].isin(df1[cols].to_dict('l')).all(axis=1)
pd.concat([df1[bool1], df2[bool2]])
Out:
A B C
1 2 8 2
4 2 8 5
1 2 8 9
4 2 8 10
Upvotes: 4