E. Jaep
E. Jaep

Reputation: 2143

How to build a new column that contains the concatenation of the columns name if the value of the 'cell' is True

I have a DataFrame that is defined by:

import pandas as pd

df = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1] })

output of <code>display(df)</code>

my aim is to:

Typically, the end result would be a DataFrame like:

df_result = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1], 'features': ['feature_1, feature_2', 'feature_2', 'feature_2, feature_3'] })

output of <code>display(df_result)</code>

I tried using an apply but I don't think this is the right way of doing things (on top of not working...):

def get_features(row):
    for column in row.colums:
        print(column.name)

df.apply(lambda row: get_features(row))

What would be the correct way to approach this?

Upvotes: 0

Views: 61

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

You can do a melt, then groupby:

s = df.melt(id_vars='product')
s[s.value.eq(1)].groupby('product').variable.agg(', '.join)

Output:

product
A    feature_1, feature_2
B               feature_2
C    feature_2, feature_3
Name: variable, dtype: object

Upvotes: 2

BENY
BENY

Reputation: 323226

We can use dot

s=df.filter(like='feature')
df['New']=s.dot(s.columns+',').str[:-1]
df
Out[146]: 
  product  feature_1  feature_2  feature_3                  New
0       A          1          1          0  feature_1,feature_2
1       B          0          1          0            feature_2
2       C          0          1          1  feature_2,feature_3

Upvotes: 3

Related Questions