Reputation: 1
How do I fix this error message , "ValueError: Input contains NaN, infinity or a value too large for dtype('float32')"
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Loading the dataset
data = pd.read_csv(r'C:\Users\sam.jones\Desktop\Fixed Income project\Data Pull\Data\Fixed Income_Data dump_2018.csv',error_bad_lines=False,encoding = "ISO-8859-2")
X = np.array([data.iloc[:,158].values])
Y = data.iloc[:,92].values
#Fitting Random Forest Regression to the dataset
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X,Y)
Upvotes: 0
Views: 1686
Reputation: 55
In my case that error was due to big numbers, in particular I found those with scientific notation, such as 3.63E+08, 1.25E+09... The solution is to replace those numbers with something smaller: you can either simply replace them with x / 1000 or, the best solution, use a function to scale or normalise the data. After that, you can train your model
Upvotes: 0
Reputation: 2022
Input might have Nan's.
So use np.nan_to_num(X)
to fill them with zeroes first.
Upvotes: 2