danche
danche

Reputation: 1815

How to do intersection match between 2 DataFrames in Pandas?

Assume exists 2 DataFrames A and B like following

A:

a A
b B
c C

B:

1 2
3 4

How to produce C DataFrame like

a  A  1 2
a  A  3 4
b  B  1 2
b  B  3 4
c  C  1 2
c  C  3 4

Is there some function in Pandas can do this operation?

Upvotes: 3

Views: 222

Answers (1)

jezrael
jezrael

Reputation: 862611

First all values has to be unique in each DataFrame.

I think you need product:

from  itertools import product

A = pd.DataFrame({'a':list('abc')})
B = pd.DataFrame({'a':[1,2]})

C = pd.DataFrame(list(product(A['a'], B['a'])))
print (C)
   0  1
0  a  1
1  a  2
2  b  1
3  b  2
4  c  1
5  c  2

Pandas pure solutions with MultiIndex.from_product:

mux = pd.MultiIndex.from_product([A['a'], B['a']])

C = pd.DataFrame(mux.values.tolist())
print (C)
   0  1
0  a  1
1  a  2
2  b  1
3  b  2
4  c  1
5  c  2
C = mux.to_frame().reset_index(drop=True)
print (C)
   0  1
0  a  1
1  a  2
2  b  1
3  b  2
4  c  1
5  c  2

Solution with cross join with merge and column filled by same scalars by assign:

df = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
df.columns = ['a','b']
print (df)
   a  b
0  a  1
1  a  2
2  b  1
3  b  2
4  c  1
5  c  2

EDIT:

A = pd.DataFrame({'a':list('abc'), 'b':list('ABC')})
B = pd.DataFrame({'a':[1,3], 'c':[2,4]})

print (A)
   a  b
0  a  A
1  b  B
2  c  C

print (B)
   a  c
0  1  2
1  3  4

C = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
C.columns = list('abcd')
print (C)
   a  b  c  d
0  a  A  1  2
1  a  A  3  4
2  b  B  1  2
3  b  B  3  4
4  c  C  1  2
5  c  C  3  4

Upvotes: 2

Related Questions