Reputation: 595
It looks like running the sklearn MLPclassifier with the same input on different devices will give different accuracy results, even if a global seed is set.
MWE:
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
np.random.seed(1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, stratify=y, random_state=np.random.RandomState(0))
nn = MLPClassifier(hidden_layer_sizes=(100,100),
activation='relu',
solver='adam',
alpha=0.001,
batch_size=50,
learning_rate_init=0.01,
max_iter=1000,
random_state=np.random.RandomState(0))
nn.fit(X_train, y_train)
y_train_pred = nn.predict(X_train)
acc_train = np.sum(y_train == y_train_pred, axis=0) / X_train.shape[0]
y_test_pred = nn.predict(X_test)
acc_test = np.sum(y_test == y_test_pred, axis=0) / X_test.shape[0]
results.append([acc_train,acc_test])
How can reproducibility be guaranteed (independent of the executing device)?
Upvotes: 0
Views: 320
Reputation: 4273
I cannot reproduce this.
If there is something wrong, this would probably need more information about the different machines. What is the result of calling python -c 'import sklearn; sklearn.show_versions()'
on each?
The following code gives me the same result on Ubuntu/Red Hat when scikit-learn==0.24.2
(I tried with different: numpy==1.19.1/1.20.2
and scipy==1.5.2/1.6.3
).
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = MLPClassifier(
hidden_layer_sizes=(100, 100),
activation="relu",
solver="adam",
alpha=0.001,
batch_size=50,
learning_rate_init=0.01,
max_iter=1000,
random_state=0,
)
clf.fit(X_train, y_train)
print(clf.score(X_train, y_train))
print(clf.score(X_test, y_test))
Output:
0.9272300469483568
0.9370629370629371
Upvotes: 1