uri2563
uri2563

Reputation: 33

Scikit-learn RandomForestClassifier error

I am using Python 3.5 and I have NumPy, SciPy, and matplotlib installed and imported.

When I try:

# Import the random forest package
from sklearn.ensemble import RandomForestClassifier

# Create the random forest object which will include all the parameters
# for the fit
forest = RandomForestClassifier(n_estimators = 1)

# Fit the training data to the Survived labels and create the decision trees
forest = forest.fit(train_data[0::,1::],train_data[0::,0])

# Take the same decision trees and run it on the test data
output = forest.predict(test_data)

(test_data and train_data are both float arrays) I get the following error:

C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\fixes.py:64: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if 'order' in inspect.getargspec(np.copy)[0]:
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
Traceback (most recent call last):
  File "C:/Users/Uri/PycharmProjects/titanic1/fdsg.py", line 54, in <module>
    output = forest.predict(test_data)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\ensemble\forest.py", line 461, in predict
    X = check_array(X, ensure_2d=False, accept_sparse="csr")
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 352, in check_array
    _assert_all_finite(array)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 52, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Process finished with exit code 1

Upvotes: 1

Views: 3186

Answers (1)

Euclides
Euclides

Reputation: 287

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
import numpy as np

X = np.random.randint(0, (2**31)-1, (500, 4)).astype(object)
y = np.random.randint(0, 2, 500)
clf = RandomForestClassifier()
print(X.max())
clf.fit(X, y) # OK
print("First fit OK")

# 1 - First case your data has null values
X[0,0] = np.nan # replaces of of the cells by a null value
#clf.fit(X, y) # gives you the same error

# to solve NAN values you can use the Imputer class:
imp = Imputer(strategy='median')
X_ok = imp.fit_transform(X)
clf.fit(X_ok, y)

# 2 - Second case your data has huge integers
X[0,0] = 2**128 # the same happens if you have a huge integer
#clf.fit(X, y) # gives you the same error
# to solve this you can clip your values to some cap
X_ok = X.clip(-2**63, 2**63) # I used 2**63 for example, but you should realize what makes sense to your application
clf.fit(X_ok, y)

Upvotes: 1

Related Questions