AK_Eyes
AK_Eyes

Reputation: 31

Scikit-learn Scaling Question (inverse_transform)

Hoping someone on here can help, I'm struggling to get prediction values back to "unscaled" values. I'm using StandardScaler() in sklearn.preprocessing . My dataset is a numpy array with 4 columns (called dataset).

I've tried:

# full dataset scaled, then split to 
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X,Y, test_size = 0.4) 

# model looks good but can't inverse_transform(Y_pred) obviously. 
Y_pred = adam.predict(X_test)

scaled X_train, X_test # individually 
# model comes out bad

scaled X_train, X_test, Y_train, Y_test # individually 
# model comes out bad

Am I applying scaling in an incorrect way?

Any suggestions on how to inverse scale of Y_pred on scaled model run?

Thanks for any help on this!

Upvotes: 0

Views: 8697

Answers (2)

Austin Mackillop
Austin Mackillop

Reputation: 1245

Here is an example of what I have used to scale data for use in an LSTM model. The data set is Open, High, Low, Close financial data. The model uses past values of Open, High, Low, and Close to try and predict what the Close will be at some point in the future thus scaling of all data is needed but the output Close needs to be inverse scaled back into an actual price point.

Start by instantiating two scaler objects depending on what scaler you are using:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

scaler = MinMaxScaler(feature_range = (0, 1))
scaler_single = MinMaxScaler(feature_range = (0, 1))

Use scaler to transform the Open, High, and Low data and scaler_singleto scale the Close data. Then build the scaled data set by concatenating the results. ohlcv is a Pandas DataFrame object.

scaled_data = np.concatenate([scaler.fit_transform(ohlcv[['Open', 'High', 'Low']]), 
                                  scaler_single.fit_transform(ohlcv[['Close']])], axis = 1)

Now in order to inverse scale the outputted Close data, use the inverse_transform method of the scaler_single object. predicted_prices is the array returned by my model.

real_prices = scaler_single.inverse_transform(predicted_prices)

I hope that helps.

Upvotes: 0

AK_Eyes
AK_Eyes

Reputation: 31

Here was my workaround:

#standard scaler used to condition data
def scaler(x):
    mu = statistics.mean(x)
    stddev = statistics.stdev(x)
    standardized = (x-mu)/stddev
    return(standardized)

#Split data into X, Y and condition (X are the "features", Y is the forecasted/predicted price or "target")
Y = dataset[:,6]
ymu = statistics.mean(Y) #before scaler transform, get mean to inverse scaler transform after model
ystddev = statistics.stdev(Y) #before scaler transform, get stdev
Y = scaler(Y) #scale (i.e. condition/transform) forecasted price data
Xprice = dataset[:,4]
Xvolume = dataset[:,5]
Xprice = scaler(Xprice) #scale (i.e. condition/transform) price data
Xvolume = scaler(Xvolume) #scale (i.e. condition/transform) volume data
X = np.vstack((Xprice, Xvolume)).T #create 2D array of scale features

Then after test/train split and running the model:

Y_pred = adam.predict(X_test)
#undo scaling after model is run to get back to original scale
Y_test_inverse = (Y_test * ystddev) + ymu
Y_pred_inverse = (Y_pred * ystddev) + ymu

This produced good results with the actual scale of Y data and Y predict being correct (as far as I can tell).

Upvotes: 1

Related Questions