Python: get combinations of unique values from columns of datafame

Question

I have a dataframe like this:

id  a   b   c   d   e
0   a10 a11 a12 a13 a14
1   a10 a21 a12 a23 a24
2   a30 a21 a12 a33 a14
3   a30 a21 a12 a43 a44
4   a10 a51 a12 a53 a14

and I want all unique lists of combinations of length 'x' from the dataframe. If length is 3 then some of the combinations will be:

[[a10,a11,a12],[a10,a21,a12],[a10,a51,a12],[a30,a11,a12],[a30,a21,a12],[a30,a51,a12],
[a11,a12,a13],[a11,a12,a23],[a11,a12,a33],[a11,a12,a43],[a11,a12,a53],[a21,a12,a13]....]

There are only 2 constraints:

1. Length of combination lists should be equal to the 'x'
2. In one combination, there can be at max only 1 unique value from a column of dataframe.

The minimal piece of code is given below that is constructing the dataframe. Any help will be much appreciated. Thanks!

data_dict={'a':['a10','a10','a30','a30','a10'],
          'b':['a11','a21','a21','a21','a51'],
          'c':['a12','a12','a12','a12','a12'],
          'd':['a13','a23','a33','a43','a53'],
          'e':['a14','a24','a14','a44','a14']}
df1=pd.DataFrame(data_dict)

jezrael · Accepted Answer

Use combinations with filtering by sets created by each column of DateFrame for second condition:

from  itertools import combinations

L = [set(df[x]) for x in df]
a = [x for x in combinations(np.unique(df.values.ravel()), 3) 
     if all(len(set(x).intersection(y)) < 2 for y in L)]

Python: get combinations of unique values from columns of datafame

Answers (2)

Related Questions