Reputation: 351
I have a dataset like the df below:
Person atrib1 atrib2 atrib3 atrib4
Paulo 0 1 0 1
Andres 1 1 0 1
I want a to create a new column atrib_list
with a list of the atrib where the value it's value is 1 like this output:
Person atrib1 atrib2 atrib3 atrib4 atrib_list
Paulo 0 1 0 1 ['atrib2',atrib4']
Andres 1 1 0 1 ['atrib1','atrib2','atrib4]
I am trying something like this:
df['atrib_list'] = df.apply(lambda x: x for x in df.columns if df.value==1)
but it's completly wrong
Upvotes: 0
Views: 35
Reputation: 260390
You can use a stack
:
df['atrib_list'] = (df
.filter(like='atrib').replace(0, pd.NA)
.stack().reset_index(1)
.groupby(level=0)['level_1'].agg(list)
)
Other idea using itertools.compress
:
from itertools import compress
cols = list(df.filter(like='atrib'))
df['atrib_list'] = df[cols].apply(lambda x: list(compress(cols, x)), axis=1)
output:
Person atrib1 atrib2 atrib3 atrib4 atrib_list
0 Paulo 0 1 0 1 [atrib2, atrib4]
1 Andres 1 1 0 1 [atrib1, atrib2, atrib4]
Upvotes: 1
Reputation: 323226
Fix your output
df['new'] = df.apply(lambda y: [z for x,z in zip(y,y.index) if x==1] ,axis=1)
Out[308]:
0 [atrib2, atrib4]
1 [atrib1, atrib2, atrib4]
dtype: object
Upvotes: 1