Phd student
Phd student

Reputation: 35

ValueError: Input contains infinity or a value too large for dtype('float64')

i need help I'm working on machine learning. I tried to import a dataset using this code:

    # Importing the libraries
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd

    # Importing the dataset
    dataset = pd.read_csv('Rural3.csv', low_memory=False)
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, 77].values

    # Splitting the dataset into the Training set and Test set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

    # Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

However, an error appears: ValueError: Input contains infinity or a value too large for dtype('float64')

What should i do please ? i'm newbie in python. Thanks in advance.

Upvotes: 0

Views: 30756

Answers (5)

sanjyay
sanjyay

Reputation: 76

import numpy as np

df_new = df[np.isfinite(df).all(1)]

This removes the rows that contain infinity or NaN values

Upvotes: 0

Induraj PR
Induraj PR

Reputation: 304

This solution works well, Fixed the error while power transforming

df =df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]

Upvotes: 4

yogi Datascience
yogi Datascience

Reputation: 1

This error is quite misleading at times. if you have blank values in the data set ( which means certain features in the dataset have blank values)even then you can get this type of error. How do we resolve this ...

  1. Covert the dataframe and export them into csv. below is the code "df" is the dataframe Dataframe to CSV
compression_opts = dict(method='zip',archive_name='out.csv')  
df.to_csv('out.zip', index=False, compression=compression_opts) 

You can also try this

df[df['column_name'] == ''].index
  1. Identify the features which have blank values by analyzing the output CSV.

  2. Remove the complete record which have blank values, through the below code

df = df.dropna(subset=['column_name'])

Upvotes: 0

Yonas Kassa
Yonas Kassa

Reputation: 3720

I would suggest you to see if you have null values, after loading the dataset with pandas do the following:

dataset = dataset.dropna()

also make sure that your X values are numeric, you can use either dataset.describe() or dataset.info():

print(dataset.info()) # will give you info about the dataset columns

you can also try to update your sklearn, there is a known bug in certain versions of sklearn (i dont remeber which one)

# if you are using conda
conda install scikit-learn 
# if you are using pip
pip install -U scikit-learn 

Upvotes: 1

Ari7
Ari7

Reputation: 19

Try normalizing if your data has very large values. You can find more info here

Upvotes: 0

Related Questions