Reputation: 35
i need help I'm working on machine learning. I tried to import a dataset using this code:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Rural3.csv', low_memory=False)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 77].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
However, an error appears: ValueError: Input contains infinity or a value too large for dtype('float64')
What should i do please ? i'm newbie in python. Thanks in advance.
Upvotes: 0
Views: 30756
Reputation: 76
import numpy as np
df_new = df[np.isfinite(df).all(1)]
This removes the rows that contain infinity
or NaN
values
Upvotes: 0
Reputation: 304
This solution works well, Fixed the error while power transforming
df =df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]
Upvotes: 4
Reputation: 1
This error is quite misleading at times. if you have blank values in the data set ( which means certain features in the dataset have blank values)even then you can get this type of error. How do we resolve this ...
compression_opts = dict(method='zip',archive_name='out.csv')
df.to_csv('out.zip', index=False, compression=compression_opts)
You can also try this
df[df['column_name'] == ''].index
Identify the features which have blank values by analyzing the output CSV.
Remove the complete record which have blank values, through the below code
df = df.dropna(subset=['column_name'])
Upvotes: 0
Reputation: 3720
I would suggest you to see if you have null values, after loading the dataset with pandas do the following:
dataset = dataset.dropna()
also make sure that your X values are numeric, you can use either dataset.describe() or dataset.info():
print(dataset.info()) # will give you info about the dataset columns
you can also try to update your sklearn, there is a known bug in certain versions of sklearn (i dont remeber which one)
# if you are using conda
conda install scikit-learn
# if you are using pip
pip install -U scikit-learn
Upvotes: 1