user6089076
user6089076

Reputation:

python: Combination of two Columns

I have isolated a column from one dataframe, using the code:

 Column_a = df1.loc[:,'Column_a_Name']

and a second column from another dataframe, equivalently using:

 Column_b = df2.loc[:,'Column_b_Name'].

These columns contain names, I would like to create a list of all possible combinations of the two names in each. For example:

 Column_a         Column_b
 Adam             Smith
 Barry            Brown
 Ben              Red

The result I am trying to achieve is a dataframe of the nature

 [(Adam,Smith), (Adam, Brown), (Adam,Red), (Barry, Brown),...,(Ben, Red)]

I have tried the useful function itertools.combinations (Column_a, Column_b), but this just returns the result: TypeError: cannot convert the series to < type 'int' >. Thanks

Upvotes: 10

Views: 24525

Answers (2)

rnso
rnso

Reputation: 24555

List comprehension in base python works well here:

outlist = [ (i, j)
    for i in df.colA
    for j in df.colB ]
print(outlist)

Output:

[('Adam', 'Smith'), ('Adam', 'Brown'), ('Adam', 'Red'), ('Barry', 'Smith'), ('Barry', 'Brown'), ('Barry', 'Red'), ('Ben', 'Smith'), ('Ben', 'Brown'), ('Ben', 'Red')]

This can be converted to dataframe:

newdf = pd.DataFrame(data=outlist, columns=['first_col','second_col'])
print(newdf)

Output:

  first_col second_col
0      Adam      Smith
1      Adam      Brown
2      Adam        Red
3     Barry      Smith
4     Barry      Brown
5     Barry        Red
6       Ben      Smith
7       Ben      Brown
8       Ben        Red

Upvotes: 10

Sahil Puri
Sahil Puri

Reputation: 491

Use itertools.product

>>>>df = pd.DataFrame(data=[['Adam', 'Smith'], ['Barry', 'Brown'], ['Ben', 'Red']], columns=['Column_a_Name', 'Column_b_Name'])
df

  Column_a_Name Column_b_Name
0          Adam         Smith
1         Barry         Brown
2           Ben           Red

>>>>from itertools import product

>>>>list(product(df['Column_a_Name'], df['Column_b_Name']))


 [('Adam', 'Smith'),
 ('Adam', 'Brown'),
 ('Adam', 'Red'),
 ('Barry', 'Smith'),
 ('Barry', 'Brown'),
 ('Barry', 'Red'),
 ('Ben', 'Smith'),
 ('Ben', 'Brown'),
 ('Ben', 'Red')]

Note: The product function returns a generator. If you want to loop over the data, you don't need a list.

Upvotes: 13

Related Questions