Reputation: 19
i use OneHotencoder for transform categorical column to numerical data but the algorithm change name of columns. how i save the same name columns ?
(I use python 3)
My Data-Frame like this :
>>> import pandas
>>> import numpy
>>> ar = numpy.array([['yassine', 1], ['jack',7], ['ahmed',4]])
>>> df = pandas.DataFrame(ar, columns = ['name', 'label'])
>>> df
name label
0 yassine 1
1 jack 7
2 ahmed 4
>>> import category_encoders as ce
>>> ohe = ce.OneHotEncoder(handle_unknown='ignore',
use_cat_names=True)
>>> label_fournisseur = ohe.fit_transform(list(df['name']))
>>> label_fournisseur
0_yassine 0_jack 0_ahmed
0 1 0 0
1 0 1 0
2 0 0 1
I need the columns to stay the same without any change:
yassine jack ahmed
0 1 0 0
1 0 1 0
2 0 0 1
thank you
Upvotes: 0
Views: 1356
Reputation: 2248
You can change the names of the columns as you see fit. To remove the "0_" you can do for example:
label_fournisseur.columns = [ x[2:] for x in label_fournisseur.columns ]
Another way to get what you want (without another library):
pandas.get_dummies(df["name"])
results in:
ahmed jack yassine
0 0 0 1
1 0 1 0
2 1 0 0
Note: get_dummies does one hot encoding if input has one category per observation (your case). For other cases (multiple categories per observation) the whole question should be changed different (as you can't have one category only in the column name).
Upvotes: 2