Reputation: 65
I am working on Titanic Dataset. I have filled the missing values in categorical columns. The categorical columns start from index 0 to index 3. I have used LabelEncoder for the categorical columns.
While using onehotencoder , an error occurs : Input contains NaN, infinity or a value too large for dtype('float64').
There are no NaN values. I am not able to correct this error
I have tried scaling before using OneHotEncoder but still the error appears.
y_train = train.iloc[:,-1].values
x_train = train.iloc[:,:-1].values
test = test.iloc[:,:].values
from sklearn.preprocessing import
LabelEncoder,OneHotEncoder,StandardScaler
for i in range(4):
le = LabelEncoder()
x_train[:,i]=le.fit_transform(x_train[:,i])
test[:,i]=le.transform(test[:,i])
#sc = StandardScaler()
#x_train = sc.fit_transform(x_train)
#test = sc.transform(test)
ohe = OneHotEncoder(categorical_features=[range(4)])
x_train = ohe.fit_transform(x_train).toarray()
test = ohe.transform(test).toarray()
How to solve this error?
Upvotes: 2
Views: 330
Reputation: 516
I was also seeing occasional NaNs popping up in the column when using LabelEncoder. But the code was a little different:
df['c'] = pd.DataFrame(LabelEncoder().fit_transform(df['c']))
In this case, the solution was simply changing this to:
df['c'] = LabelEncoder().fit_transform(df['c'])
Upvotes: 0