Reputation: 725
I have a dataframe with about 200k rows and 20 columns looking like this:
List |Apple|Banana|Bread|Milk|Sweets|Water|
List1|0 | 0 |0 | 1 | 0 | 0 |
List2|0 | 1 |0 | 0 | 0 | 0 |
List3|0 | 0 |0 | 1 | 0 | 0 |
List4|0 | 0 |0 | 0 | 1 | 0 |
List5|0 | 0 |1 | 0 | 0 | 0 |
I would like to replace each 1
with the relevant column name and then reduce the dataframe to the following output:
List |Items
List1|Milk
List2|Banana
List3|Milk
List4|Sweets
List5|Bread
Any idea how to do this without selecting each column individually and renaming them one by one?
Upvotes: 0
Views: 65
Reputation: 862671
Here is solution with DataFrame.dot
, which join columns names if multiple 1
per rows by ,
, if there is only one value working too:
df = df.set_index('List')
df1 = df.dot(df.columns + ',').str[:-1].reset_index(name='Items')
print (df1)
List Items
0 List1 Milk
1 List2 Banana
2 List3 Milk
3 List4 Sweets
4 List5 Bread
If there is always 1 value of 1
per rows is possible use also:
df = df.set_index('List')
df1 = df.dot(df.columns).reset_index(name='Items')
print (df1)
List Items
0 List1 Milk
1 List2 Banana
2 List3 Milk
3 List4 Sweets
4 List5 Bread
But if multiple values it working with join without separator like:
print (df)
List Apple Banana Bread Milk Sweets Water
0 List1 1 0 0 1 0 0 <- Apple,Milk=1
1 List2 0 1 0 0 0 0
2 List3 0 0 0 1 0 0
3 List4 0 0 0 0 1 0
4 List5 0 0 1 0 0 0
df = df.set_index('List')
df1 = df.dot(df.columns).reset_index(name='Items')
print (df1)
List Items
0 List1 AppleMilk
1 List2 Banana
2 List3 Milk
3 List4 Sweets
4 List5 Bread
With separator ouput is:
df = df.set_index('List')
df1 = df.dot(df.columns + ',').str[:-1].reset_index(name='Items')
print (df1)
List Items
0 List1 Apple,Milk
1 List2 Banana
2 List3 Milk
3 List4 Sweets
4 List5 Bread
Upvotes: 1