user10141156
user10141156

Reputation: 170

Panda csv itertools combinations

My dataset looks like this,

Col1    Col2    Col3
A       10      x1
B       100     x2
C       1000    x3

This is what I am getting my output to look like,

Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9
A       10      x1      Empty   Empty   Empty   Empty   Empty   Empty
B       100     x2      Empty   Empty   Empty   Empty   Empty   Empty
C       1000    x3      Empty   Empty   Empty   Empty   Empty   Empty
A       10      x1      B       100     x2      Empty   Empty   Empty
B       100     x2      C       1000    x3      Empty   Empty   Empty
A       10      x1      B       100     x2      C       1000    x3

Thanks to help from this website, this can be done with -

arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)

pd.DataFrame(arr)

But if the dataset was the following,

        Col1 Col2   Col3   Structure
        A    10     x1     1
        B    100    x2     1
        C    1000   x3     2

And the output needed to be this -

  Col1    Col2    Col3      Col4    Col5    Col6    Col7    Col8    Col9    Answer
    A       10      x1      Empty   Empty   Empty   Empty   Empty   Empty   No
    B       100     x2      Empty   Empty   Empty   Empty   Empty   Empty   No
    C       1000    x3      Empty   Empty   Empty   Empty   Empty   Empty   Yes
    A       10      x1      B       100     x2      Empty   Empty   Empty   Yes
    B       100     x2      C       1000    x3      Empty   Empty   Empty   No
    A       10      x1      B       100     x2      C       1000    x3      No

Which is basically saying A and B are 'YES' because they are in the same structure and C by itself is 'YES because it is in the structure by itself. All the other rows such as A, B, ABC are 'NO' because they are not in the same structure. How do I get the above desired table?

The code,

arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)

pd.DataFrame(arr)

gives me this output,

    Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9
    A       10      x1      Empty   Empty   Empty   Empty   Empty   Empty
    B       100     x2      Empty   Empty   Empty   Empty   Empty   Empty
    C       1000    x3      Empty   Empty   Empty   Empty   Empty   Empty
    A       10      x1      B       100     x2      Empty   Empty   Empty
    B       100     x2      C       1000    x3      Empty   Empty   Empty
    A       10      x1      B       100     x2      C       1000    x3

How do I add the 'Answer' column to this output to get the ultimate table?

Upvotes: 1

Views: 280

Answers (1)

user3483203
user3483203

Reputation: 51165

Because of the structure of the DataFrame, we know that when we apply itertools.combinations, the Structure column will show up first in the 3rd column, and every fourth column following:

  0     1   2   3     4       5     6    7     8       9     10   11
0  A    10  x1   1  None     NaN  None  NaN  None     NaN  None  NaN
1  B   100  x2   1  None     NaN  None  NaN  None     NaN  None  NaN
2  C  1000  x3   2  None     NaN  None  NaN  None     NaN  None  NaN
3  A    10  x1   1     B   100.0    x2  1.0  None     NaN  None  NaN
4  A    10  x1   1     C  1000.0    x3  2.0  None     NaN  None  NaN
5  B   100  x2   1     C  1000.0    x3  2.0  None     NaN  None  NaN
6  A    10  x1   1     B   100.0    x2  1.0     C  1000.0    x3  2.0

We can use this to index only the Structure columns, check if they contain all members of a group, then drop them:

checker = df.groupby('Structure').size().to_dict()

def helper(row):                                               
    u = row[~row.isnull()].values                              
    return (len(np.unique(u)) == 1) & (checker[u[0]] == len(u))

s = out[out.columns[3::4]].apply(helper, 1).replace({False: 'No', True: 'Yes'})

0     No
1     No
2    Yes
3    Yes
4     No
5     No
6     No
dtype: object

To drop the other columns and assign to the DataFrame:

out.drop(out.columns[3::4], 1).assign(final=s)

   0     1   2     4       5     6     8       9    10 final
0  A    10  x1  None     NaN  None  None     NaN  None    No
1  B   100  x2  None     NaN  None  None     NaN  None    No
2  C  1000  x3  None     NaN  None  None     NaN  None   Yes
3  A    10  x1     B   100.0    x2  None     NaN  None   Yes
4  A    10  x1     C  1000.0    x3  None     NaN  None    No
5  B   100  x2     C  1000.0    x3  None     NaN  None    No
6  A    10  x1     B   100.0    x2     C  1000.0    x3    No

Upvotes: 1

Related Questions