Grendel
Grendel

Reputation: 783

Combine all column in df with pandas (itertools)

Hello I have a df such as :

COL1 COL2 COL3 COL4 
A    B    C    D

how can I get a nex df with all combination between columns ?

and get

COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4 
['A','B']['A','C']  ['A','D'] ['B','C'] ['B','D'] ['C','D']

I gess we coule use itertool?

Upvotes: 2

Views: 254

Answers (1)

piterbarg
piterbarg

Reputation: 8219

Indeed itertools are useful here

from itertools import combinations
columns = [df[c] for c in df.columns]

column_pairs = ([pd.DataFrame(
        columns = [pair[0].name + '_' + pair[1].name], 
        data= pd.concat([pair[0],pair[1]],axis=1)
    .apply(list,axis=1)) 
    for pair in combinations(columns, 2)]
    )

pd.concat(column_pairs, axis = 1)

produces

    COL1_COL2    COL1_COL3    COL1_COL4    COL2_COL3    COL2_COL4    COL3_COL4
--  -----------  -----------  -----------  -----------  -----------  -----------
 0  ['A', 'B']   ['A', 'C']   ['A', 'D']   ['B', 'C']   ['B', 'D']   ['C', 'D']
 1  ['a', 'b']   ['a', 'c']   ['a', 'd']   ['b', 'c']   ['b', 'd']   ['c', 'd']

(I added another row to the original df with a, b, c, d, to make sure it works in this slightly more general case)

The code is fairly straightforward. columns are a list of columns, each as pd.Series, of the original dataframe. combinations(columns, 2) enumerate all pairs of those. The pd.DataFrame(columns = [pair[0].name + '_' + pair[1].name], data= pd.concat([pair[0],pair[1]],axis=1).apply(list,axis=1)) combines first and second column from the tuple pair into a single-column df with the combined name and values. Finally pd.concat(column_pairs, axis = 1) combines them all together

Upvotes: 2

Related Questions