Atharva Joshi
Atharva Joshi

Reputation: 65

How to solve the error : Input contains NaN, infinity or a value too large for dtype('float64').?

I am working on Titanic Dataset. I have filled the missing values in categorical columns. The categorical columns start from index 0 to index 3. I have used LabelEncoder for the categorical columns.

While using onehotencoder , an error occurs : Input contains NaN, infinity or a value too large for dtype('float64').

There are no NaN values. I am not able to correct this error

I have tried scaling before using OneHotEncoder but still the error appears.

y_train = train.iloc[:,-1].values
x_train = train.iloc[:,:-1].values
test = test.iloc[:,:].values

from sklearn.preprocessing import 
LabelEncoder,OneHotEncoder,StandardScaler
for i in range(4):
    le = LabelEncoder()
    x_train[:,i]=le.fit_transform(x_train[:,i])
    test[:,i]=le.transform(test[:,i])

#sc = StandardScaler()
#x_train = sc.fit_transform(x_train)
#test = sc.transform(test)

ohe = OneHotEncoder(categorical_features=[range(4)])
x_train = ohe.fit_transform(x_train).toarray()
test = ohe.transform(test).toarray()

How to solve this error?

Upvotes: 2

Views: 330

Answers (1)

There
There

Reputation: 516

I was also seeing occasional NaNs popping up in the column when using LabelEncoder. But the code was a little different:

df['c'] = pd.DataFrame(LabelEncoder().fit_transform(df['c']))

In this case, the solution was simply changing this to:

df['c'] = LabelEncoder().fit_transform(df['c'])

Upvotes: 0

Related Questions