Demonstrating overfitting with Keras on 2D data

I'm a computer science teacher currently building an introductory course on deep learning. Python and the Keras framework are my tools of choice.

I'd like to show my students what is overfitting by training a model of increasing complexity on some predefined 2D data, just like at the end of this example.

The same idea appears in a programming activity for Andrew Ng's course on neural networks tuning.

However, no matter how hard I try, I can't replicate this behavior with Keras. Using the same dataset and hyperparameters, the decision boundaries are always "smoother" and the model never fits the noisy points in the dataset. See my results below and click here to browse the associated code. Here's the relevant extract:

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='tanh', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=50)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

Am I doing something wrong? Are there some internal optimization mechanisms in Keras kicking in? Can I mitigate them with other compilation choices?

Upvotes: 1

Answers (3)

Baptiste

Reputation: 115

I finally managed to obtain overfitting on my data by significantly increasing the number of gradient descents and parameters updates. It works pretty well with both tanh and ReLU activation functions.

Here's the updated line:

history = model.fit(x_train, y_train, verbose=0, epochs=5000, batch_size=200)

The complete code is here and gives the following result.

Upvotes: 0

PositronBeam

Reputation: 136

You can also increase the number of epochs, and use 'relu' as activation layer, in order to get sharp edges, like Andrew Ng. I ran your notebook under Colaboratory with a 1-layer network of 50 neurons and added noise to your moons, in order to get separate colored areas. Please have a look, and don't forget to activate the GPU (exécution / modifier le type d'exécution).

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='relu', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=5000)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

5000 epochs + relu (looks like what you want)

5000 epochs + tanh (tanh smoothes too much the curve for you)

Upvotes: 1

Iman Mirzadeh

Reputation: 13600

Your problem is all of your examples are a one-layer neural network with different size! If you print the weights, you'll notice after when you increase size of layer(for example from 5 to 50) the other neurons (45 neurons in example) will have weights near to zero so they are the same.

You have increase depth of your neural network to see the overfitting. For example I changed your code in a way that the first two examples are single-layer NN and the third one([30, 30, 30, 30]) is a four layer NN (the full source code is here):

# Generate moon-shaped data with less samples and more noise
# data, targets = make_moons(500, noise=0.45)
from sklearn.datasets import make_moons, make_classification

data, targets =  make_classification(n_samples = 200, n_features=2, n_redundant=0, n_informative=2,
                           random_state=2, n_clusters_per_class=2)
plot_data(data, targets)
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [[2], [20], [30, 30, 30, 30]]

for i, hidden_layer_sizes in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {}'.format(str(hidden_layer_sizes)))
    model = Sequential()
    for j, layer_size in enumerate(hidden_layer_sizes):
      if j == 0:
        model.add(Dense(layer_size, activation='tanh', input_shape=(2,)))
      else:
        model.add(Dense(layer_size, activation='tanh'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=0.1), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=500)
    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

and here's the result:

Also you can achieve your goal using Tensorflow Playground. please check it! It has a nice interactive UI

Upvotes: 1

Demonstrating overfitting with Keras on 2D data

Answers (3)

Related Questions