user11543965
user11543965

Reputation:

Standard scaler produces different values before PCA

I am doing a classification problem in biometrics. I am comparing with the euclidean distance each probe in the testing set with the gallery.

Everytime I run the code I get different results. If I remove the scaler I get always the same results.

Why does the scaler produce different values? (the difference is slightly, sometimes it recognizes 10 more probes, sometimes 10 less). Thanks to all who answer.

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)
pca = PCA(n_components=50).fit(training_scaled)
training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)

Upvotes: 3

Views: 410

Answers (1)

seralouk
seralouk

Reputation: 33147

The only thing that I can suspect is that probably the arpack or randomized solvers are used behind the scene in your case since this is defined automatically. In that case, you need to fix the random seed in order to reproduce the results.

Try to fix the random seed by passing a value in the input argument random_state of the PCA instance.

myseed = 0

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)

#here
pca = PCA(n_components=50, random_state=myseed).fit(training_scaled)

training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)

Upvotes: 1

Related Questions