Convolutional neural network not converging

Question

I've been watching some videos on deep learning/convolutional neural networks, like here and here, and I tried to implement my own in C++. I tried to keep the input data fairly simple for my first attempt so the idea is to differentiate between a cross and a circle, I have a small data set of around 25 of each (64*64 images), they look like this:

The network itself is five layers:

Convolution (5 filters, size 3, stride 1, with a ReLU)
MaxPool (size 2) 
Convolution (1 filter, size 3, stride 1, with a ReLU)
MaxPool (size 2)
Linear Regression classifier

My issue is that my network isn't converging, on anything. None of the weights appear to change. If I run it the predictions mostly stay the same other than the occasional outlier which jumps up before returning on the next iteration.

The convolutional layer training looks something like this, removed some loops to make it cleaner

// Yeah, I know I should change the shared_ptr
void ConvolutionalNetwork::Train(std::shared_ptr input,std::shared_ptr outputGradients, float label)
{
    float biasGradient = 0.0f;

    // Calculate the deltas with respect to the input.
    for (int layer = 0; layer < m_Filters.size(); ++layer)
    {
        // Pseudo-code, each loop on it's own line in actual code
        For z < depth, x


So, I create a buffer (m_pGradients) which is the dimensions of the input buffer to feed the gradients back to the previous layer but use the gradient sum to adjust the weights.
The max pooling calculates the gradients back like so (it saves the max indices and zeros all the other gradients out)
void MaxPooling::Train(std::shared_ptr input,std::shared_ptr outputGradients, float label)
{
    for (int outputVolumeIndex = 0; outputVolumeIndex 

And the final regression layer calculates its gradients like this:
void LinearClassifier::Train(std::shared_ptr data,std::shared_ptr output, float y)
{
    float * x  = data.get();

    float biasError = 0.0f;
    float h = Hypothesis(output) - y;

    for (int i =1; i < m_NumberOfWeights; ++i)
    {
        float error = h*x[i];
        m_pGradients.get()[i] = error;
        biasError += error;
    }

    float cost = h;
    m_Error = cost*cost;

    for (int theta = 1; theta < m_NumberOfWeights; ++theta)
    {
        m_pWeights.get()[theta] = m_pWeights.get()[theta] - learningRate*m_pGradients.get()[theta];
    }

    m_pWeights.get()[0] -= learningRate*biasError;
}

After 100 iterations of training on the two examples the prediction on each is the same as the other and unchanged from the start.

Should a convolutional network like this be able to discriminate between the two classes?
Is this the correct approach?
Should I be accounting for the ReLU (max) in the convolution layer backpropagation?

Convolutional neural network not converging

Answers (1)

Related Questions