Reputation: 327
I'm using LabelEncoder and OneHotEncoder to handle 'categorical data' in my dataset. In my data set there is a column which can have two values either 'Petrol' or 'Diesel' and I want to encode that column. I'm running this piece of code and its giving an error.
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
dataset = pd.read_csv('ToyotaCorolla.csv')
X = dataset.iloc[:, 1:10].values
y = dataset.iloc[:, 0].values
labelencoder_X = LabelEncoder()
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
Column[3] is the one which will have categorical value. But it is showing up an error "ValueError: could not convert string to float: 'Diesel'". I dont know where I'm going wrong. please help. Thanks!
Upvotes: 5
Views: 10101
Reputation: 41
this error comes when your x is having a column with categories in string format when I had had this error I used label encoder to all the categorical columns in X as you did to column 3 and then apply one hot encoder to column 3
"so what you have to do is LabelEncode all the categorical columns in X and then apply one hot encoder to your desired column"
Upvotes: 0
Reputation: 2129
categorical_features
is deprecated, instead directly transform your categorical feature
onehotencoder = OneHotEncoder(categories='auto')
feature = onehotencoder.fit_transform(X[:, 3].reshape(-1, 1))
Upvotes: 5