ds_user
ds_user

Reputation: 2179

pandas assign value in multiple columns based on value in one

I have a dataset like this,

sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
        'cat1': [0,0,0],
        'cat2': [0,0,0],
        'cat3': [0,0,0],
        'cat4': [0,0,0]
        }

pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])


              Theme   cat1 cat2 cat3 cat4
0   never give a ten    0   0   0   0
1   interaction speed   0   0   0   0
2   no feedback,premium 0   0   0   0

Now, I need to replace the values in cat columns based on value in Theme. If the Theme column has 'never give a ten', then change cat1 as 1, similarly if the theme column has 'interaction speed', then change cat2 as 1, if the theme column has 'no feedback' in it, change 'cat3' as 1 and for 'premium' change cat4 as 1.

In this sample I have provided 4 categories, I have in total 21 categories. I can do if word in string 21 times for 21 categories, but I am looking for an efficient way to write this in a function, loop every row and go through the logic and update the corresponding columns, can anyone help please?

Thanks in advance.

Upvotes: 0

Views: 208

Answers (1)

jezrael
jezrael

Reputation: 863801

Here is possible set columns names by categories with Series.str.get_dummies - columns names are sorted:

df1 = df['Theme'].str.get_dummies(',')
print (df1)
   interaction speed  never give a ten  no feedback  premium
0                  0                 1            0        0
1                  1                 0            0        0
2                  0                 0            1        1

If need first column in output add DataFrame.join:

df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
                 Theme  interaction speed  never give a ten  no feedback  \
0     never give a ten                  0                 1            0   
1    interaction speed                  1                 0            0   
2  no feedback,premium                  0                 0            1   

   premium  
0        0  
1        0  
2        1  

If order of columns is important add DataFrame.reindex:

#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
   never give a ten  interaction speed  no feedback  premium
0                 1                  0            0        0
1                 0                  1            0        0
2                 0                  0            1        1


cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
                 Theme  never give a ten  interaction speed  no feedback  \
0     never give a ten                 1                  0            0   
1    interaction speed                 0                  1            0   
2  no feedback,premium                 0                  0            1   

   premium  
0        0  
1        0  
2        1  

Upvotes: 1

Related Questions