Reputation: 107
I'd like to check with you all something regarding the steps of standard scaling:
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)
X_unseen = ss.fit_transform(df_test)
df_test is basically a .csv file of totally unseen data.
For the above code, is it ok to ss.fit_transform(df_test), when this ss has already fit_transformed(X_train)? Would this ss already have "learned" from X_train dataset, and as such, I need to instantiate a new StandardScaler() to fit_transform(df_test)?
Thank you.
Upvotes: 1
Views: 147
Reputation: 6260
When you use a standardscaler you only train it once, otherwise it is not the same scaler again and it would have an influence of your following steps/algorithm. So this means:
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)
X_unseen = ss.transform(df_test)
Upvotes: 2