DAEHYUN KIM
DAEHYUN KIM

Reputation: 99

How to find out combination of two columns in Dataframe when there is multiple columns in Python?

I have data frame like this,

df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5], 'c':[5,6,7,8,5,5,5]})

I want to generate a new data frame with a combination of two columns. For example, I want to finds out combination of ( a,b a,c a,d b,c b,d c,d )

I tried to use this code

col_a_list = df['a'].tolist()
col_b_list = df['b'].tolist()
col_c_list = df['c'].tolist()

import itertools
def cartesian_product(lists):
    return list(itertools.product(*lists))
ls = [col_a_list,col_b_list,col_c_list,col_d_list]
print("Original Lists:",ls)
print("Cartesian product of the said lists: ",cartesian_product(ls))

However, it does not work. This generates a combination of three columns, not two.

I want my output like this,

column x  column y
1          2
1          5
1          3
1          6
1          4
1          7
.
.
.
.

Is there any way that I could generate these columns? Thanks in advance!

Upvotes: 0

Views: 583

Answers (1)

jezrael
jezrael

Reputation: 863166

I believe you need:

df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5], 
                   'c':[5,6,7,8,5,5,5], 'd':[5,6,7,8,5,5,5]})


import itertools
def comb(df):
    return [df.loc[:, list(x)].set_axis(['column x','column y'], axis=1) 
            for x in itertools.combinations(df.columns, 2)]

df1 = pd.concat(comb(df)).sort_index(kind='mergesort').reset_index(drop=True)
print(df1.head(10))
   column x  column y
0         1         2
1         1         5
2         1         5
3         2         5
4         2         5
5         5         5
6         2         3
7         2         6
8         2         6
9         3         6

Upvotes: 1

Related Questions