pandas dataframe label columns encoding

Question

Have a pandas dataframe with string input columns. df looks like:

news                          label1      label2      label3  label4
COVID Hospitalizations ....   health
will pets contract covid....  health      pets
High temperature will cause.. health      weather
...

Expected output

news                          health      pets      weather  tech
COVID Hospitalizations ....   1           0         0        0 
will pets contract covid....  1           1         0        0
High temperature will cause.. 1           0         1        0
...

Currently I used sklean

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df['labels'] = df[['label1','label2','label3','label4']].values.tolist()
mlb.fit(df['labels'])
temp = mlb.transform(df['labels'])
ff = pd.DataFrame(temp, columns = list(mlb.classes_))
df_final = pd.concat([df['news'],ff], axis=1)

this works so far. Just wondering if there is a way to avoid to use sklearn.preprocessing.MultiLabelBinarizer ?

pandas dataframe label columns encoding

Answers (1)

Related Questions