Reputation: 170
My dataset looks like this,
Col1 Col2 Col3
A 10 x1
B 100 x2
C 1000 x3
This is what I am getting my output to look like,
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
Thanks to help from this website, this can be done with -
arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
But if the dataset was the following,
Col1 Col2 Col3 Structure
A 10 x1 1
B 100 x2 1
C 1000 x3 2
And the output needed to be this -
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Answer
A 10 x1 Empty Empty Empty Empty Empty Empty No
B 100 x2 Empty Empty Empty Empty Empty Empty No
C 1000 x3 Empty Empty Empty Empty Empty Empty Yes
A 10 x1 B 100 x2 Empty Empty Empty Yes
B 100 x2 C 1000 x3 Empty Empty Empty No
A 10 x1 B 100 x2 C 1000 x3 No
Which is basically saying A and B are 'YES' because they are in the same structure and C by itself is 'YES because it is in the structure by itself. All the other rows such as A, B, ABC are 'NO' because they are not in the same structure. How do I get the above desired table?
The code,
arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
gives me this output,
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
How do I add the 'Answer' column to this output to get the ultimate table?
Upvotes: 1
Views: 280
Reputation: 51165
Because of the structure of the DataFrame, we know that when we apply itertools.combinations
, the Structure
column will show up first in the 3rd column, and every fourth column following:
0 1 2 3 4 5 6 7 8 9 10 11
0 A 10 x1 1 None NaN None NaN None NaN None NaN
1 B 100 x2 1 None NaN None NaN None NaN None NaN
2 C 1000 x3 2 None NaN None NaN None NaN None NaN
3 A 10 x1 1 B 100.0 x2 1.0 None NaN None NaN
4 A 10 x1 1 C 1000.0 x3 2.0 None NaN None NaN
5 B 100 x2 1 C 1000.0 x3 2.0 None NaN None NaN
6 A 10 x1 1 B 100.0 x2 1.0 C 1000.0 x3 2.0
We can use this to index only the Structure
columns, check if they contain all members of a group, then drop them:
checker = df.groupby('Structure').size().to_dict()
def helper(row):
u = row[~row.isnull()].values
return (len(np.unique(u)) == 1) & (checker[u[0]] == len(u))
s = out[out.columns[3::4]].apply(helper, 1).replace({False: 'No', True: 'Yes'})
0 No
1 No
2 Yes
3 Yes
4 No
5 No
6 No
dtype: object
To drop the other columns and assign to the DataFrame:
out.drop(out.columns[3::4], 1).assign(final=s)
0 1 2 4 5 6 8 9 10 final
0 A 10 x1 None NaN None None NaN None No
1 B 100 x2 None NaN None None NaN None No
2 C 1000 x3 None NaN None None NaN None Yes
3 A 10 x1 B 100.0 x2 None NaN None Yes
4 A 10 x1 C 1000.0 x3 None NaN None No
5 B 100 x2 C 1000.0 x3 None NaN None No
6 A 10 x1 B 100.0 x2 C 1000.0 x3 No
Upvotes: 1