Reputation: 157
Need to create a new column that concatenate multiple columns based on the value of each column. For example, input:
s1 s2 s3
1 0 0
1 1 0
0 1 2
Output:
s1 s2 s3 col
1 0 0 s1
1 1 0 s1, s2
0 1 2 s2, s3
Basically I need to output a column name when the value is >0, and only output it once even if the number is more than 1.
The code I used but didn't work:
df['col'] = 's1' * min(df['s1'], 1) + ', ' + 's2' * min(df['s2'], 1) + ', ' + 's3' * min(df['s3'], 1)
Upvotes: 1
Views: 215
Reputation: 863801
Create mask for compare values greater like 0
by DataFrame.gt
and then use DataFrame.dot
with columns names for matrix multiplication:
cols = ["s1", "s2", "s3"]
df['col'] = df[cols].gt(0).dot(df.columns + ',').str[:-1]
print (df)
s1 s2 s3 col
0 1 0 0 s1
1 1 1 0 s1,s2
2 0 1 2 s2,s3
Upvotes: 5
Reputation: 175
It can be done with a df.apply
and a lambda
function as follows
df["col"] = df.apply(lambda row: ",".join(row[row>0].keys().tolist()), axis=1)
you can also pass specific column names when you have to apply this operation to only a specific columns
df["col2"] = df[["s1", "s2", "s3"]].apply(lambda row: ",".join(row[row>0].keys().tolist()), axis=1)
Upvotes: 0