Stephen Ortmann
Stephen Ortmann

Reputation: 1

Input contains infinity of value too large for dtype "float64"

So I am pretty new to python in general and I am trying to follow a tutorial to normalize and scale all of my data; however, I keep getting an error. I am using Scikit-learn with pandas. I've searched around and have tried just about everything I can think of, but I am still getting this error.

I keep receiving this error, which traces back to preprocessing.scale:

ValueError: Input contains infinity or a value too large for dtype('float64').

The column that's kicking back the error has a min of -10.3800048828125 and a max of 10.209991455078123. All data types are float64 or int64 (not in this column though). I've tried multiple methods of getting rid of the infinities and NaNs but none of them seem to be working. If anyone has any advice it would be greatly appreciated!

The code that is getting the issue is here:

def preprocess_df(df):
    df = df.drop('future', 1)
    df.replace([np.inf, -np.inf], np.nan)
    df.fillna(method='bfill', inplace=True)
    df.dropna(inplace=True)

    for col in df.columns:
        print("Trying Column: " + col)
        if col != "target":
            df[col] = df[col].pct_change()
            df.dropna(inplace=True)
            df[col] = preprocessing.scale(df[col].values)
    df.dropna(inplace=True)

    sequential_data = []
    prev_days = deque(maxlen=SEQ_LEN)

    for i in df.values:
        prev_days.append([n for n in i[:-1]]) #appends every column to the prev days list, except for target (we don't want that to be known)
        if len(prev_days) == SEQ_LEN:
            sequential_data.append([np.array(prev_days), i[:-1]])

    random.shuffle(sequential_data)

Upvotes: 0

Views: 1891

Answers (1)

Gilseung Ahn
Gilseung Ahn

Reputation: 2614

Here your problem: df.replace([np.inf, -np.inf], np.nan).

Change the code as df = df.replace([np.inf, -np.inf], np.nan).

Upvotes: 1

Related Questions