Reputation: 31
Hoping someone on here can help, I'm struggling to get prediction values back to "unscaled" values. I'm using StandardScaler() in sklearn.preprocessing . My dataset is a numpy array with 4 columns (called dataset).
I've tried:
# full dataset scaled, then split to
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X,Y, test_size = 0.4)
# model looks good but can't inverse_transform(Y_pred) obviously.
Y_pred = adam.predict(X_test)
scaled X_train, X_test # individually
# model comes out bad
scaled X_train, X_test, Y_train, Y_test # individually
# model comes out bad
Am I applying scaling in an incorrect way?
Any suggestions on how to inverse scale of Y_pred on scaled model run?
Thanks for any help on this!
Upvotes: 0
Views: 8697
Reputation: 1245
Here is an example of what I have used to scale data for use in an LSTM model. The data set is Open, High, Low, Close financial data. The model uses past values of Open, High, Low, and Close to try and predict what the Close will be at some point in the future thus scaling of all data is needed but the output Close needs to be inverse scaled back into an actual price point.
Start by instantiating two scaler objects depending on what scaler you are using:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
scaler = MinMaxScaler(feature_range = (0, 1))
scaler_single = MinMaxScaler(feature_range = (0, 1))
Use scaler
to transform the Open, High, and Low data and scaler_single
to scale the Close data. Then build the scaled data set by concatenating the results. ohlcv
is a Pandas DataFrame object.
scaled_data = np.concatenate([scaler.fit_transform(ohlcv[['Open', 'High', 'Low']]),
scaler_single.fit_transform(ohlcv[['Close']])], axis = 1)
Now in order to inverse scale the outputted Close data, use the inverse_transform
method of the scaler_single
object. predicted_prices
is the array returned by my model.
real_prices = scaler_single.inverse_transform(predicted_prices)
I hope that helps.
Upvotes: 0
Reputation: 31
Here was my workaround:
#standard scaler used to condition data
def scaler(x):
mu = statistics.mean(x)
stddev = statistics.stdev(x)
standardized = (x-mu)/stddev
return(standardized)
#Split data into X, Y and condition (X are the "features", Y is the forecasted/predicted price or "target")
Y = dataset[:,6]
ymu = statistics.mean(Y) #before scaler transform, get mean to inverse scaler transform after model
ystddev = statistics.stdev(Y) #before scaler transform, get stdev
Y = scaler(Y) #scale (i.e. condition/transform) forecasted price data
Xprice = dataset[:,4]
Xvolume = dataset[:,5]
Xprice = scaler(Xprice) #scale (i.e. condition/transform) price data
Xvolume = scaler(Xvolume) #scale (i.e. condition/transform) volume data
X = np.vstack((Xprice, Xvolume)).T #create 2D array of scale features
Then after test/train split and running the model:
Y_pred = adam.predict(X_test)
#undo scaling after model is run to get back to original scale
Y_test_inverse = (Y_test * ystddev) + ymu
Y_pred_inverse = (Y_pred * ystddev) + ymu
This produced good results with the actual scale of Y data and Y predict being correct (as far as I can tell).
Upvotes: 1