Lilly_Co
Lilly_Co

Reputation: 199

What is the correct code to use StandardScaler() on x_train and x_test in sklearn?

I'm having trouble to find the correct code standardize my data among the 3 options below:

# Option 1
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

# Option 2
from sklearn.preprocessing import StandardScaler
sc = StandardScaler().fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

# Option 3
from sklearn.preprocessing import StandardScaler
sc = StandardScaler().fit(X_train)
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

Thanks to give me a little explanation to it

Upvotes: 3

Views: 3194

Answers (1)

dx2-66
dx2-66

Reputation: 2851

Those are all correct, yielding the same result, except the 3rd one does a redundant refit. 1st one is subjectively more readable. fit_tranform() fits the train data and transforms the train set accordingly.

The only incorrect way of using scaling is fitting the test set.

Upvotes: 5

Related Questions