Reputation: 451
I have a dataframe with many dummy variables. Instead of having a lot of different dummy columns, I want only one column and each row needs to contain a string with only the dummy variable equal to 1.
index a b c
0 1 1 1
1 0 0 1
Output:
index dummies
0 ['a','b','c']
1 ['c']
Upvotes: 0
Views: 426
Reputation: 8768
You can stack and use groupby:
df.where(df.eq(1)).stack().reset_index(level=1).groupby(level=0)['level_1'].agg(list)
or:
df.mul(df.columns).where(lambda x: x.ne('')).stack().groupby(level=0).agg(list)
or:
df.dot(df.columns + ',').str.rstrip(',').str.split(',')
Output:
0 [a, b, c]
1 [c]
Name: level_1, dtype: object
Upvotes: 0
Reputation: 777
dummies = df.apply(lambda x: [col for col in df.columns if x[col] == 1], axis=1)
Upvotes: 1