Back to basics for the XOR problem - fundamentaly confused

Question

This is something that has been bothering me for a while about XOR and MLP; it may be basic (if so, apoligies in advance), but I would like to know.

There are many approaches to solving XOR with MLP, but generally they look like this:

from sklearn.neural_network import MLPClassifier

X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]

model = MLPClassifier(
    activation='relu', max_iter=1000, hidden_layer_sizes=(4,2))

Now to fit the model:

model.fit(X, y)

And, guess what?

print('score:', model.score(X, y))

outputs a perfect

score: 1.0

But what is being predicted and scored? In the case of XOR we have a dataset which, by definition(!) has four rows, two features and one binary label. There is no standard X_train, y_train, X_test, y_test to work with. By definition, again, there is no unseen data for the model to digest.

The prediction takes place in the form of

model.predict(X)

which is exactly the same X that training was performed on.

So doesn't the model just spit back the y it was trained on? How do we know the model "learned" anything?

EDIT: Just to try to clarify what baffles me - the features have 2 and only 2 unique values; the 2 unique values have 4 and only 4 possible combinations. The right label for each possible combination is already present in the label column. So what is there for the model to "learn" when fit() is called? And how is this "learning" performed? How can the model ever be "wrong" when it has access to the "right" answer for each possible combination of inputs?

Again, sorry for what is probably a very basic question.

Bahman Rouhani · Accepted Answer

The key thing is that XOR problem was proposed to demonstrate how some models can learn non-linear problems and some models can't.

So when a model gets 1.0 accuracy on the dataset you mentioned, it's notable since it has learned a non-linear problem. The fact that it has learned the training data is enough for us to know that it can [potentially] learn non-linear models. Notice that if this wasn't the case your model would get a very low accuracy like 0.25 since it divides the 2D space into two sub-spaces by a line.

To understand this better, let's see a case where a model can't learn the data under this same circumstances:

import tensorflow as tf
import numpy as np

X = np.array(X)
y = np.array(y)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(2, activation='relu'))

model.compile(optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.1), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit(X, y, epochs=100)
_, acc = model.evaluate(X, y)
print('acc = ' + str(acc))

which gives:

acc = 0.5

As you can see this model can't classify the data it has already seen. The reason is, this is a non-linear data and our model can only classify linear data.(here is a link to understand the non-linearity of XOR problem better). As soon as we add another layer to our network it will be able to solve this problem:

import tensorflow as tf
import numpy as np

X = np.array(X)
y = np.array(y)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='relu'))


tb_callback = tf.keras.callbacks.TensorBoard(log_dir='./test/', write_graph=True)

model.compile(optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.1), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit(X, y, epochs=5, callbacks=[tb_callback, ])
acc = model.evaluate(X, y)
print('acc = ' + str(acc))

which gives:

acc = 1.0

By adding only one neuron our model learned to do what it couldn't learn in 100 epochs with 1 layer (even though it had already seen the data).

So to sum up, it is correct that our dataset is so small that the network can easily memorize it but the XOR problem is important because it means there are networks that can't memorize this data no matter what.

Having said that however, there are varsities of XOR problems with proper train and test sets. here is one (the plot is slightly different):

import numpy as np
import matplotlib.pyplot as plt

x1 =np.concatenate([np.random.uniform(0, 100, 100), np.random.uniform(-100, 0, 100)])
y1 =np.concatenate([np.random.uniform(-100, 0, 100), np.random.uniform(0, 100, 100)])

x2 =np.concatenate([np.random.uniform(0, 100, 100), np.random.uniform(-100, 0, 100)])
y2 =np.concatenate([np.random.uniform(0, 100, 100), np.random.uniform(-100, 0, 100)])

plt.scatter(x1, y1, c='red')
plt.scatter(x2, y2, c='blue')
plt.show()

hope that helped ;))

Back to basics for the XOR problem - fundamentaly confused

Answers (1)

Related Questions