Reputation:
I have isolated a column from one dataframe, using the code:
Column_a = df1.loc[:,'Column_a_Name']
and a second column from another dataframe, equivalently using:
Column_b = df2.loc[:,'Column_b_Name'].
These columns contain names, I would like to create a list of all possible combinations of the two names in each. For example:
Column_a Column_b
Adam Smith
Barry Brown
Ben Red
The result I am trying to achieve is a dataframe of the nature
[(Adam,Smith), (Adam, Brown), (Adam,Red), (Barry, Brown),...,(Ben, Red)]
I have tried the useful function itertools.combinations (Column_a, Column_b), but this just returns the result: TypeError: cannot convert the series to < type 'int' >. Thanks
Upvotes: 10
Views: 24525
Reputation: 24555
List comprehension in base python works well here:
outlist = [ (i, j)
for i in df.colA
for j in df.colB ]
print(outlist)
Output:
[('Adam', 'Smith'), ('Adam', 'Brown'), ('Adam', 'Red'), ('Barry', 'Smith'), ('Barry', 'Brown'), ('Barry', 'Red'), ('Ben', 'Smith'), ('Ben', 'Brown'), ('Ben', 'Red')]
This can be converted to dataframe:
newdf = pd.DataFrame(data=outlist, columns=['first_col','second_col'])
print(newdf)
Output:
first_col second_col
0 Adam Smith
1 Adam Brown
2 Adam Red
3 Barry Smith
4 Barry Brown
5 Barry Red
6 Ben Smith
7 Ben Brown
8 Ben Red
Upvotes: 10
Reputation: 491
Use itertools.product
>>>>df = pd.DataFrame(data=[['Adam', 'Smith'], ['Barry', 'Brown'], ['Ben', 'Red']], columns=['Column_a_Name', 'Column_b_Name'])
df
Column_a_Name Column_b_Name
0 Adam Smith
1 Barry Brown
2 Ben Red
>>>>from itertools import product
>>>>list(product(df['Column_a_Name'], df['Column_b_Name']))
[('Adam', 'Smith'),
('Adam', 'Brown'),
('Adam', 'Red'),
('Barry', 'Smith'),
('Barry', 'Brown'),
('Barry', 'Red'),
('Ben', 'Smith'),
('Ben', 'Brown'),
('Ben', 'Red')]
Note: The product function returns a generator. If you want to loop over the data, you don't need a list.
Upvotes: 13