Reputation: 889
I'm looking for a faster way to concatenate a value of first column to all the values of the second column for all values in the first column.
This is achievable by looping through all values, yes, but my dataset is quite big and looping takes a while to finish.
Here's a replicated example:
import pandas as pd
df = pd.DataFrame({'col_1':['a','b','c'], 'col_2':['d','e','f']})
df
col_1 col_2
0 a d
1 b e
2 c f
I want a new dataframe that returns:
col_1
0 a d
1 a e
2 a f
3 b d
4 b e
5 b f
6 c d
7 c e
8 c f
I can't quite phrase the right term to search for.
There might be a vetorized approach to achieve this or a pandas method. Answers or link to a similar question/s will be appreciated.
Thanks in advance :)
Upvotes: 2
Views: 238
Reputation: 42926
Using DataFrame.merge
:
df['key'] = 1
mrg = df[['col_1', 'key']].merge(df[['col_2', 'key']], on='key').drop(columns='key')
col_1 col_2
0 a d
1 a e
2 a f
3 b d
4 b e
5 b f
6 c d
7 c e
8 c f
Upvotes: 4
Reputation: 75120
Use itertools.product
here:
import itertools
pd.DataFrame([' '.join(i) for i in itertools.product(df.col_1,df.col_2)],columns=['col1'])
col1
0 a d
1 a e
2 a f
3 b d
4 b e
5 b f
6 c d
7 c e
8 c f
Upvotes: 2