Reputation: 131
When I execute the command:
clf.fit(train_data, train_label)
I'm obtaining the following error
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
The problem is the array train_data
with size (18000,20). I've tried to use this command:
clf.fit(np.float32(train_data), train_label)
or
train_data = np.array([s[0].astype('float32') for s in train_data])
Find the datasets train_data and train_label in the train file (python) in the following link:
https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0
However, I cannot get that all the values from the array "train_data" being valid for the clf.fit
function. Any help?
Upvotes: 0
Views: 569
Reputation: 33127
Just found a solution to overcome this error. You need to scale the data:
Code:
from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale
with open('train', 'rb') as f:
train_data, train_label = pickle.load(f)
#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))
#so need to scale
train_data = scale(train_data)
clf = RandomForestClassifier()
clf.fit(train_data, train_label)
Upvotes: 1