Reputation: 13
My code is to analyze the PUBG dataset from kaggle and make a model. I have extracted all the features and Standardised them using StandardScaler from sklearn.
//Snippet
X=standardized_data
y=training_features_output
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42)
print(standardized_data.shape,training_features_output.shape)
[Output]: (4446966, 16) (4446966,)
print(np.all(np.isinf(standardized_data)))
print(np.all(np.isinf(training_features_output)))
print(np.all(np.isnan(standardized_data)))
print(np.all(np.isnan(training_features_output)))
[Output]:
False
False
False
False
print(X.dtype)
print(y.dtype)
[Output]:
dtype('float64')
dtype('float64')
model=LinearRegression()
model.fit(X_train,y_train)
y_train_pred=model.predict(X_train)
y_test_pred=model.predict(X_test)
print('Train r2_accuracy:',r2_score(y_train,y_train_pred))
print('Test r2_accuracy:',r2_score(y_test,y_test_pred))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
error photo
Full Code
From the above outputs we can see that they are no nan and infinite values in the dataset and also the data is in float64. but how am I getting this error and how to resolve it?
Tried other queries regarding this on stackoverflow all were having nan or something messed up and I dont know where is this code messing up.
Upvotes: 1
Views: 7582
Reputation: 33147
Your checking point is not correct because you are checking if all
the data are inf
using np.all()
.
print(np.all(np.isinf(standardized_data)))
...
np.any()
.Proof:
a = [np.inf, 0, 1]
np.all(np.isinf(a))
#False
np.any(np.isinf(a))
#True
Upvotes: 1