Reputation: 99
I have data frame like this,
df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5], 'c':[5,6,7,8,5,5,5]})
I want to generate a new data frame with a combination of two columns. For example, I want to finds out combination of ( a,b a,c a,d b,c b,d c,d )
I tried to use this code
col_a_list = df['a'].tolist()
col_b_list = df['b'].tolist()
col_c_list = df['c'].tolist()
import itertools
def cartesian_product(lists):
return list(itertools.product(*lists))
ls = [col_a_list,col_b_list,col_c_list,col_d_list]
print("Original Lists:",ls)
print("Cartesian product of the said lists: ",cartesian_product(ls))
However, it does not work. This generates a combination of three columns, not two.
I want my output like this,
column x column y
1 2
1 5
1 3
1 6
1 4
1 7
.
.
.
.
Is there any way that I could generate these columns? Thanks in advance!
Upvotes: 0
Views: 583
Reputation: 863166
I believe you need:
df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5],
'c':[5,6,7,8,5,5,5], 'd':[5,6,7,8,5,5,5]})
import itertools
def comb(df):
return [df.loc[:, list(x)].set_axis(['column x','column y'], axis=1)
for x in itertools.combinations(df.columns, 2)]
df1 = pd.concat(comb(df)).sort_index(kind='mergesort').reset_index(drop=True)
print(df1.head(10))
column x column y
0 1 2
1 1 5
2 1 5
3 2 5
4 2 5
5 5 5
6 2 3
7 2 6
8 2 6
9 3 6
Upvotes: 1