Reputation: 1
So I am pretty new to python in general and I am trying to follow a tutorial to normalize and scale all of my data; however, I keep getting an error. I am using Scikit-learn with pandas. I've searched around and have tried just about everything I can think of, but I am still getting this error.
I keep receiving this error, which traces back to preprocessing.scale:
ValueError: Input contains infinity or a value too large for dtype('float64').
The column that's kicking back the error has a min of -10.3800048828125
and a max of 10.209991455078123
. All data types are float64
or int64
(not in this column though). I've tried multiple methods of getting rid of the infinities and NaNs but none of them seem to be working. If anyone has any advice it would be greatly appreciated!
The code that is getting the issue is here:
def preprocess_df(df):
df = df.drop('future', 1)
df.replace([np.inf, -np.inf], np.nan)
df.fillna(method='bfill', inplace=True)
df.dropna(inplace=True)
for col in df.columns:
print("Trying Column: " + col)
if col != "target":
df[col] = df[col].pct_change()
df.dropna(inplace=True)
df[col] = preprocessing.scale(df[col].values)
df.dropna(inplace=True)
sequential_data = []
prev_days = deque(maxlen=SEQ_LEN)
for i in df.values:
prev_days.append([n for n in i[:-1]]) #appends every column to the prev days list, except for target (we don't want that to be known)
if len(prev_days) == SEQ_LEN:
sequential_data.append([np.array(prev_days), i[:-1]])
random.shuffle(sequential_data)
Upvotes: 0
Views: 1891
Reputation: 2614
Here your problem: df.replace([np.inf, -np.inf], np.nan)
.
Change the code as df = df.replace([np.inf, -np.inf], np.nan)
.
Upvotes: 1