Standard scaler produces different values before PCA

Question

I am doing a classification problem in biometrics. I am comparing with the euclidean distance each probe in the testing set with the gallery.

Everytime I run the code I get different results. If I remove the scaler I get always the same results.

Why does the scaler produce different values? (the difference is slightly, sometimes it recognizes 10 more probes, sometimes 10 less). Thanks to all who answer.

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)
pca = PCA(n_components=50).fit(training_scaled)
training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)

seralouk · Accepted Answer

The only thing that I can suspect is that probably the arpack or randomized solvers are used behind the scene in your case since this is defined automatically. In that case, you need to fix the random seed in order to reproduce the results.

Try to fix the random seed by passing a value in the input argument random_state of the PCA instance.

myseed = 0

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)

#here
pca = PCA(n_components=50, random_state=myseed).fit(training_scaled)

training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)

Standard scaler produces different values before PCA

Answers (1)

Related Questions