How to build a new column that contains the concatenation of the columns name if the value of the 'cell' is True

Question

I have a DataFrame that is defined by:

import pandas as pd

df = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1] })

my aim is to:

add a column called features
fill in this column with the name of all the columns where the value of the cell is equal to one

Typically, the end result would be a DataFrame like:

df_result = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1], 'features': ['feature_1, feature_2', 'feature_2', 'feature_2, feature_3'] })

I tried using an apply but I don't think this is the right way of doing things (on top of not working...):

def get_features(row):
    for column in row.colums:
        print(column.name)

df.apply(lambda row: get_features(row))

What would be the correct way to approach this?

Quang Hoang · Accepted Answer

You can do a melt, then groupby:

s = df.melt(id_vars='product')
s[s.value.eq(1)].groupby('product').variable.agg(', '.join)

Output:

product
A    feature_1, feature_2
B               feature_2
C    feature_2, feature_3
Name: variable, dtype: object

How to build a new column that contains the concatenation of the columns name if the value of the 'cell' is True

Answers (2)

Related Questions

How to build a new column that contains the concatenation of the columns name if the value of the &#39;cell&#39; is True

Answers (2)

Related Questions

How to build a new column that contains the concatenation of the columns name if the value of the 'cell' is True