Reputation: 2143
I have a DataFrame that is defined by:
import pandas as pd
df = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1] })
my aim is to:
features
Typically, the end result would be a DataFrame like:
df_result = pd.DataFrame({'product': ['A', 'B', 'C'], 'feature_1': [1,0,0], 'feature_2': [1,1,1], 'feature_3': [0,0,1], 'features': ['feature_1, feature_2', 'feature_2', 'feature_2, feature_3'] })
I tried using an apply
but I don't think this is the right way of doing things (on top of not working...):
def get_features(row):
for column in row.colums:
print(column.name)
df.apply(lambda row: get_features(row))
What would be the correct way to approach this?
Upvotes: 0
Views: 61
Reputation: 150735
You can do a melt
, then groupby
:
s = df.melt(id_vars='product')
s[s.value.eq(1)].groupby('product').variable.agg(', '.join)
Output:
product
A feature_1, feature_2
B feature_2
C feature_2, feature_3
Name: variable, dtype: object
Upvotes: 2
Reputation: 323226
We can use dot
s=df.filter(like='feature')
df['New']=s.dot(s.columns+',').str[:-1]
df
Out[146]:
product feature_1 feature_2 feature_3 New
0 A 1 1 0 feature_1,feature_2
1 B 0 1 0 feature_2
2 C 0 1 1 feature_2,feature_3
Upvotes: 3