LomaxOnTheRun
LomaxOnTheRun

Reputation: 702

Theano/Lasagne basic neural network with regression won't overfit dataset of size one

I'm using a basic neural network in Theano/Lasagne to try to identify facial keypoints in images, and am currently trying to get it to learn a single image (I've just taken the first image from my training set). The images are 96x96 pixels, and there are 30 key points (outputs) that it needs to learn, but it fails to do so. This is my first attempt at using Theano/Lasagne, so I'm sure I've just missed something obvious, but I can't see what I've done wrong:

import sys
import os
import time

import numpy as np
import theano
import theano.tensor as T

import lasagne
import pickle

import matplotlib.pyplot as plt

def load_data():
    with open('FKD.pickle', 'rb') as f:
        save = pickle.load(f)
        trainDataset = save['trainDataset'] # (5000, 1, 96, 96) np.ndarray of pixel values [-1,1]
        trainLabels = save['trainLabels']   # (5000, 30) np.ndarray of target values [-1,1]
        del save  # Hint to help garbage collection free up memory

        # Overtrain on dataset of 1
        trainDataset = trainDataset[:1]
        trainLabels = trainLabels[:1]

    return trainDataset, trainLabels


def build_mlp(input_var=None):

    relu = lasagne.nonlinearities.rectify
    softmax = lasagne.nonlinearities.softmax

    network = lasagne.layers.InputLayer(shape=(None, 1, imageSize, imageSize), input_var=input_var)
    network = lasagne.layers.DenseLayer(network, num_units=numLabels, nonlinearity=softmax)

    return network

def main(num_epochs=500, minibatch_size=500):

    # Load the dataset
    print "Loading data..."
    X_train, y_train = load_data()

    # Prepare Theano variables for inputs and targets
    input_var = T.tensor4('inputs')
    target_var = T.matrix('targets')

    # Create neural network model
    network = build_mlp(input_var)

    # Create a loss expression for training, the mean squared error (MSE)
    prediction = lasagne.layers.get_output(network)
    loss = lasagne.objectives.squared_error(prediction, target_var)
    loss = loss.mean()

    # Create update expressions for training
    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)

    # Compile a function performing a training step on a mini-batch
    train_fn = theano.function([input_var, target_var], loss, updates=updates)

    # Collect points for final plot
    train_err_plot = []

    # Finally, launch the training loop.
    print "Starting training..."

    # We iterate over epochs:
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        start_time = time.time()
        train_err = train_fn(X_train, y_train)

        # Then we print the results for this epoch:
        print "Epoch %s of %s took %.3fs" % (epoch+1, num_epochs, time.time()-start_time)
        print "  training loss:\t\t%s" % train_err

        # Save accuracy to show later
        train_err_plot.append(train_err)

    # Show plot
    plt.plot(train_err_plot)
    plt.title('Graph')
    plt.xlabel('Epochs')
    plt.ylabel('Training loss')
    plt.tight_layout()
    plt.show()

imageSize = 96
numLabels = 30

if __name__ == '__main__':
    main(minibatch_size=1)

This gives me a graph that looks like this:

enter image description here

I'm pretty this network should be able to get the loss down to basically zero. I'd appreciate any help or thoughts on the matter :)

EDIT: Removed dropout and hidden layer to simplify the problem.

Upvotes: 1

Views: 300

Answers (1)

LomaxOnTheRun
LomaxOnTheRun

Reputation: 702

It turns out that I'd forgotten to change the output node functions from:

lasagne.nonlinearities.softmax

to:

lasagne.nonlinearities.linear

The code I was using as a base was for a classification problem (e.g. working out which digit the picture showed), whereas I was using the network for a regression problem (e.g. trying to find where certain features in an image are located). There are several useful output functions for classification problems, of which softmax is one of them, but regression problems require a linear output function to work.

Hope this helps someone else in the future :)

Upvotes: 1

Related Questions