Reputation: 169
I built regression data model to predict house price upon several independent variables. And I got regression equation with coefficient. I used StandardScaler()to scale my variables before split the data set. And now I want to predict house price when given new values for independent variables using my regression model for that thing can I directly use values for independent variables and calculate price? or before include values for independent variables should I pass the values through StandardScaler() method??
Upvotes: 0
Views: 269
Reputation: 1669
To answer your question, yes you have to process your test input as well but consider the following explanation.
StandardScaler() standardize features by removing the mean and scaling to unit variance
If you fit the scaler on whole dataset and then split, Scaler would consider all values while computing mean and Variance.
The test set should ideally not be preprocessed with the training data. This will ensure no 'peeking ahead'. Train data should be preprocessed separately and once the model is created we can apply the same preprocessing parameters used for the train set, onto the test set as though the test set didn't exist before.
Upvotes: 1
Reputation: 1741
Yes, you need to preprocess the new values. If you have scaled your training data and fitted a model to that scaled data, then any new data fed into the model should undergo equivalent preprocessing as well. This is standard practice, as it ensures that the model is always provided a data set of consistent form as input. The caveat is that you should use transform
instead of fit_transform
.
The process might look as follows:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
new_data = scaler.transform(new_data)
There is a detailed write up on this topic on another thread that might be of interest to you.
Upvotes: 2