Reputation: 783
Hello I have a df such as :
COL1 COL2 COL3 COL4
A B C D
how can I get a nex df with all combination between columns ?
and get
COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4
['A','B']['A','C'] ['A','D'] ['B','C'] ['B','D'] ['C','D']
I gess we coule use itertool?
Upvotes: 2
Views: 254
Reputation: 8219
Indeed itertools are useful here
from itertools import combinations
columns = [df[c] for c in df.columns]
column_pairs = ([pd.DataFrame(
columns = [pair[0].name + '_' + pair[1].name],
data= pd.concat([pair[0],pair[1]],axis=1)
.apply(list,axis=1))
for pair in combinations(columns, 2)]
)
pd.concat(column_pairs, axis = 1)
produces
COL1_COL2 COL1_COL3 COL1_COL4 COL2_COL3 COL2_COL4 COL3_COL4
-- ----------- ----------- ----------- ----------- ----------- -----------
0 ['A', 'B'] ['A', 'C'] ['A', 'D'] ['B', 'C'] ['B', 'D'] ['C', 'D']
1 ['a', 'b'] ['a', 'c'] ['a', 'd'] ['b', 'c'] ['b', 'd'] ['c', 'd']
(I added another row to the original df with a, b, c, d, to make sure it works in this slightly more general case)
The code is fairly straightforward. columns
are a list of columns, each as pd.Series
, of the original dataframe. combinations(columns, 2)
enumerate all pairs of those. The pd.DataFrame(columns = [pair[0].name + '_' + pair[1].name], data= pd.concat([pair[0],pair[1]],axis=1).apply(list,axis=1))
combines first and second column from the tuple pair
into a single-column df with the combined name and values. Finally pd.concat(column_pairs, axis = 1)
combines them all together
Upvotes: 2