Reputation: 199
I'm having trouble to find the correct code standardize my data among the 3 options below:
# Option 1
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
# Option 2
from sklearn.preprocessing import StandardScaler
sc = StandardScaler().fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
# Option 3
from sklearn.preprocessing import StandardScaler
sc = StandardScaler().fit(X_train)
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
Thanks to give me a little explanation to it
Upvotes: 3
Views: 3194
Reputation: 2851
Those are all correct, yielding the same result, except the 3rd one does a redundant refit. 1st one is subjectively more readable.
fit_tranform()
fits the train data and transforms the train set accordingly.
The only incorrect way of using scaling is fitting the test set.
Upvotes: 5