Night Walker
Night Walker

Reputation: 21270

Concatenate data frames by column values

How I can merge following two data frames on columns A and B:

df1
A    B    C
1    2    3
2    8    2
4    7    9

df2
A    B    C
5    6    7
2    8    9

And with result to get only results of those two matching rows.

df3
A    B    C
2    8    2
2    8    9

Upvotes: 1

Views: 92

Answers (3)

lmo
lmo

Reputation: 38500

Here is a less efficient method that should preserve duplicates, but involves two merge/joins

# create a merged DataFrame with variables C_x and C_y with the C values
temp = pd.merge(df1, df2, how='inner', on=['A', 'B'])
# join columns A and B to a stacked DataFrame with the Cs on index
temp[['A', 'B']].join(
            pd.DataFrame({'C':temp[['C_x', 'C_y']].stack()
               .reset_index(level=1, drop=True)})).reset_index(drop=True)

This returns

   A  B  C
0  2  8  2
1  2  8  9

Upvotes: 0

jezrael
jezrael

Reputation: 862671

Solution with Index.intersection, then select values in both DataFrames by loc and last concat them together:

df1.set_index(['A','B'], inplace=True)
df2.set_index(['A','B'], inplace=True)
idx = df1.index.intersection(df2.index)
print (idx)
MultiIndex(levels=[[2], [8]],
           labels=[[0], [0]],
           names=['A', 'B'],
           sortorder=0)

df = pd.concat([df1.loc[idx],df2.loc[idx]]).reset_index()
print (df)
   A  B  C
0  2  8  2
1  2  8  9

Upvotes: 1

user2285236
user2285236

Reputation:

You can concatenate them and drop the ones that are not duplicated:

conc = pd.concat([df1, df2])
conc[conc.duplicated(subset=['A', 'B'], keep=False)]
Out: 
   A  B  C
1  2  8  2
1  2  8  9

If you have duplicates,

df1
Out: 
   A  B  C
0  1  2  3
1  2  8  2
2  4  7  9
3  4  7  9
4  2  8  5

df2
Out: 
   A  B   C
0  5  6   7
1  2  8   9
3  5  6   4
4  2  8  10

You can keep track of the duplicated ones via boolean arrays:

cols = ['A', 'B']
bool1 = df1[cols].isin(df2[cols].to_dict('l')).all(axis=1)
bool2 = df2[cols].isin(df1[cols].to_dict('l')).all(axis=1)

pd.concat([df1[bool1], df2[bool2]])
Out: 
   A  B   C
1  2  8   2
4  2  8   5
1  2  8   9
4  2  8  10

Upvotes: 4

Related Questions