Reputation: 1582
I am trying to learn how to use scikit-learn's MLPClassifier. For a very simple example, I thought I'd try just to get it to learn how to compute the XOR function, since I have done that one by hand as an exercise before.
However, it just spits out zeros after I try to fit the model.
xs = np.array([
0, 0,
0, 1,
1, 0,
1, 1
]).reshape(4, 2)
ys = np.array([0, 1, 1, 0]).reshape(4,)
model = sklearn.neural_network.MLPClassifier(
activation='logistic', max_iter=10000, hidden_layer_sizes=(4,2))
model.fit(xs, ys)
print('score:', model.score(xs, ys)) # outputs 0.5
print('predictions:', model.predict(xs)) # outputs [0, 0, 0, 0]
print('expected:', np.array([0, 1, 1, 0]))
I put my code in a jupyter notebook on github as well https://gist.github.com/zrbecker/6173ac01ed30be4eea9cc96e21f4896f
Why can't scikit-learn come to a solution, when I can show explicitly that one exists? Is the cost function getting stuck in a local minimum? Is there some kind of regularization happening on the parameters that force them to stay close to 0? The parameters I used were reasonably large (i.e. -30 to 30).
Upvotes: 4
Views: 5823
Reputation: 63
Is there a magic sequence of parameters to allow the model to infer correctly from the data it hasn't seen before? None of the solution mentioned above doesn't seem to work.
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
# clf = RandomForestClassifier(random_state=0)
# clf = MLPClassifier(activation='logistic', max_iter=100, hidden_layer_sizes=(2,), alpha=0.001, solver='lbfgs', verbose = True)
clf = MLPClassifier(
activation='logistic',
max_iter=100,
hidden_layer_sizes=(2,),
solver='lbfgs')
X = [[ 0, 0], # 2 samples, 3 features
[0, 1],
# [1, 0],
[1, 1]]
y = [1,
0,
# 1,
1] # classes of each sample
clf.fit(X, y)
assert clf.predict([[0, 1]]) == [0]
assert clf.predict([[1, 0]]) == [0]
Upvotes: 0
Reputation: 61
The following is a simple example of XOR classification by sklearn.neural_network
import numpy as np
import sklearn.neural_network
inputs = np.array([[0,0],[0,1],[1,0],[1,1]])
expected_output = np.array([0,1,1,0])
model = sklearn.neural_network.MLPClassifier(
activation='logistic',
max_iter=100,
hidden_layer_sizes=(2,),
solver='lbfgs')
model.fit(inputs, expected_output)
print('predictions:', model.predict(inputs))
Upvotes: 0
Reputation: 11
Actually the point here is about 'solver' which is by default = 'adam' and works well for large data sets. Bigger 'alpha' should also improve:
MLPClassifier(activation='logistic', max_iter=100, hidden_layer_sizes=(3,), alpha=0.001, solver='lbfgs', verbose = True)
And by the way it's possible to solve this issue with only 3 elements in one hidden layer with
Upvotes: 1
Reputation: 402263
It appears a logistic activation is the root cause here.
Change your activation to either tanh
or relu
(my favourite). Demo:
model = sklearn.neural_network.MLPClassifier(
activation='relu', max_iter=10000, hidden_layer_sizes=(4,2))
model.fit(xs, ys)
Outputs for this model:
score: 1.0
predictions: [0 1 1 0]
expected: [0 1 1 0]
It's always a good idea to experiment with different network configurations before you settle on the best one or give up altogether.
Upvotes: 5