user1883163
user1883163

Reputation: 183

How to concatenate combinations of rows from two different dataframes?

I have two dataframes with different column names. I want to create a new dataframe whose column names are the concatenation of the two dataframes columns. The resulting number of rows will be all the possible combinations (n_rows choose 2) between rows of the two datasets.

df1 = pd.DataFrame({'A': ['1', '2']})
df2 = pd.DataFrame({'B': ['a', 'b', 'c']})

will generate

df3 = pd.DataFrame({'A': ['1', '1', '1', '2', '2', '2'], 
                       'B': ['a', 'b', 'c', 'a', 'b', 'c']})

Upvotes: 1

Views: 496

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150745

You can do so with pd.MultiIndex:

(pd.DataFrame(index=pd.MultiIndex.from_product([df1['A'], df2['B']], 
                                              names=['A','B']))
.reset_index())

Output:

    A   B
0   1   a
1   1   b
2   1   c
3   2   a
4   2   b
5   2   c

Upvotes: 0

Alec
Alec

Reputation: 9546

The product() function will do what you want:

pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

Definition of product():

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

Upvotes: 0

anky
anky

Reputation: 75080

Use itertools.product():

import itertools
pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

   A  B
0  1  a
1  1  b
2  1  c
3  2  a
4  2  b
5  2  c

Upvotes: 3

Related Questions