yassine
yassine

Reputation: 19

OneHotEncoder change name columns

i use OneHotencoder for transform categorical column to numerical data but the algorithm change name of columns. how i save the same name columns ?

(I use python 3)

My Data-Frame like this :

>>> import pandas 
>>> import numpy
>>> ar = numpy.array([['yassine', 1], ['jack',7], ['ahmed',4]])
>>> df = pandas.DataFrame(ar, columns = ['name', 'label'])
>>> df
      name label
0  yassine     1
1     jack     7
2    ahmed     4


>>> import category_encoders as ce
>>> ohe = ce.OneHotEncoder(handle_unknown='ignore', 
    use_cat_names=True)
>>> label_fournisseur = ohe.fit_transform(list(df['name']))
>>> label_fournisseur
   0_yassine  0_jack  0_ahmed
0          1       0        0
1          0       1        0
2          0       0        1

I need the columns to stay the same without any change:

     yassine    jack    ahmed
0          1       0        0
1          0       1        0
2          0       0        1

thank you

Upvotes: 0

Views: 1356

Answers (1)

vladmihaisima
vladmihaisima

Reputation: 2248

You can change the names of the columns as you see fit. To remove the "0_" you can do for example:

label_fournisseur.columns = [ x[2:] for x in label_fournisseur.columns ]

Another way to get what you want (without another library):

pandas.get_dummies(df["name"])

results in:

       ahmed  jack  yassine
0      0     0        1
1      0     1        0
2      1     0        0

Note: get_dummies does one hot encoding if input has one category per observation (your case). For other cases (multiple categories per observation) the whole question should be changed different (as you can't have one category only in the column name).

Upvotes: 2

Related Questions