msa
msa

Reputation: 725

How to convert pandas dataframe value to string based on column name

I have a dataframe with about 200k rows and 20 columns looking like this:

List |Apple|Banana|Bread|Milk|Sweets|Water|
List1|0    | 0    |0    | 1  | 0    | 0   |
List2|0    | 1    |0    | 0  | 0    | 0   |
List3|0    | 0    |0    | 1  | 0    | 0   |
List4|0    | 0    |0    | 0  | 1    | 0   |
List5|0    | 0    |1    | 0  | 0    | 0   |

I would like to replace each 1 with the relevant column name and then reduce the dataframe to the following output:

List |Items

List1|Milk    
List2|Banana   
List3|Milk    
List4|Sweets    
List5|Bread  

Any idea how to do this without selecting each column individually and renaming them one by one?

Upvotes: 0

Views: 65

Answers (1)

jezrael
jezrael

Reputation: 862671

Here is solution with DataFrame.dot, which join columns names if multiple 1 per rows by ,, if there is only one value working too:

df = df.set_index('List')
df1 = df.dot(df.columns + ',').str[:-1].reset_index(name='Items')
print (df1)
    List   Items
0  List1    Milk
1  List2  Banana
2  List3    Milk
3  List4  Sweets
4  List5   Bread

If there is always 1 value of 1 per rows is possible use also:

df = df.set_index('List')
df1 = df.dot(df.columns).reset_index(name='Items')
print (df1)
    List   Items
0  List1    Milk
1  List2  Banana
2  List3    Milk
3  List4  Sweets
4  List5   Bread

But if multiple values it working with join without separator like:

print (df)
    List  Apple  Banana  Bread  Milk  Sweets  Water
0  List1      1       0      0     1       0      0 <- Apple,Milk=1
1  List2      0       1      0     0       0      0
2  List3      0       0      0     1       0      0
3  List4      0       0      0     0       1      0
4  List5      0       0      1     0       0      0


df = df.set_index('List')
df1 = df.dot(df.columns).reset_index(name='Items')
print (df1)
    List      Items
0  List1  AppleMilk
1  List2     Banana
2  List3       Milk
3  List4     Sweets
4  List5      Bread

With separator ouput is:

df = df.set_index('List')
df1 = df.dot(df.columns + ',').str[:-1].reset_index(name='Items')
print (df1)
    List       Items
0  List1  Apple,Milk
1  List2      Banana
2  List3        Milk
3  List4      Sweets
4  List5       Bread

Upvotes: 1

Related Questions