Reputation: 702
I'm using a basic neural network in Theano/Lasagne to try to identify facial keypoints in images, and am currently trying to get it to learn a single image (I've just taken the first image from my training set). The images are 96x96 pixels, and there are 30 key points (outputs) that it needs to learn, but it fails to do so. This is my first attempt at using Theano/Lasagne, so I'm sure I've just missed something obvious, but I can't see what I've done wrong:
import sys
import os
import time
import numpy as np
import theano
import theano.tensor as T
import lasagne
import pickle
import matplotlib.pyplot as plt
def load_data():
with open('FKD.pickle', 'rb') as f:
save = pickle.load(f)
trainDataset = save['trainDataset'] # (5000, 1, 96, 96) np.ndarray of pixel values [-1,1]
trainLabels = save['trainLabels'] # (5000, 30) np.ndarray of target values [-1,1]
del save # Hint to help garbage collection free up memory
# Overtrain on dataset of 1
trainDataset = trainDataset[:1]
trainLabels = trainLabels[:1]
return trainDataset, trainLabels
def build_mlp(input_var=None):
relu = lasagne.nonlinearities.rectify
softmax = lasagne.nonlinearities.softmax
network = lasagne.layers.InputLayer(shape=(None, 1, imageSize, imageSize), input_var=input_var)
network = lasagne.layers.DenseLayer(network, num_units=numLabels, nonlinearity=softmax)
return network
def main(num_epochs=500, minibatch_size=500):
# Load the dataset
print "Loading data..."
X_train, y_train = load_data()
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')
target_var = T.matrix('targets')
# Create neural network model
network = build_mlp(input_var)
# Create a loss expression for training, the mean squared error (MSE)
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.squared_error(prediction, target_var)
loss = loss.mean()
# Create update expressions for training
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
# Compile a function performing a training step on a mini-batch
train_fn = theano.function([input_var, target_var], loss, updates=updates)
# Collect points for final plot
train_err_plot = []
# Finally, launch the training loop.
print "Starting training..."
# We iterate over epochs:
for epoch in range(num_epochs):
# In each epoch, we do a full pass over the training data:
start_time = time.time()
train_err = train_fn(X_train, y_train)
# Then we print the results for this epoch:
print "Epoch %s of %s took %.3fs" % (epoch+1, num_epochs, time.time()-start_time)
print " training loss:\t\t%s" % train_err
# Save accuracy to show later
train_err_plot.append(train_err)
# Show plot
plt.plot(train_err_plot)
plt.title('Graph')
plt.xlabel('Epochs')
plt.ylabel('Training loss')
plt.tight_layout()
plt.show()
imageSize = 96
numLabels = 30
if __name__ == '__main__':
main(minibatch_size=1)
This gives me a graph that looks like this:
I'm pretty this network should be able to get the loss down to basically zero. I'd appreciate any help or thoughts on the matter :)
EDIT: Removed dropout and hidden layer to simplify the problem.
Upvotes: 1
Views: 300
Reputation: 702
It turns out that I'd forgotten to change the output node functions from:
lasagne.nonlinearities.softmax
to:
lasagne.nonlinearities.linear
The code I was using as a base was for a classification problem (e.g. working out which digit the picture showed), whereas I was using the network for a regression problem (e.g. trying to find where certain features in an image are located). There are several useful output functions for classification problems, of which softmax is one of them, but regression problems require a linear output function to work.
Hope this helps someone else in the future :)
Upvotes: 1