Reputation: 614
I'm trying to tackle the classic handwritten digit recognition problem with a feed forward neural network and backpropagation, using the MNIST dataset. I'm using Michael Nielsen's book to learn the essentials and 3Blue1Brown's youtube video for the backpropagation algorithm.
I finished writing it some time ago and been debugging since, because the results are quite bad. At its best the network can recognize ~4000/10000 samples after 1 epoch and that number only drops on the following epochs, which lead me to believe there's some issue with the backpropagation algorithm. I've been drowning in index hell trying to debug this for the last few days and can't figure out where the issue is, I'd appreciate any help in pointing it out.
A bit of background: 1) I'm not using any matrix multiplication and no external frameworks, but doing everything with for loops because that's how I learned it from the video. 2) Unlike the book, I'm storing both weights and biases in the same array. The biases for every layer are a column at the end of the weight matrix for that layer.
And finally for the code, this is the Backpropagate method of the NeuralNetwork class, which is called in UpdateMiniBatch, which itself is called in SGD:
/// <summary>
/// Returns the partial derivative of the cost function on one sample with respect to every weight in the network.
/// </summary>
public List<double[,]> Backpropagate(ITrainingSample sample)
{
// Forwards pass
var (weightedInputs, activations) = GetWeightedInputsAndActivations(sample.Input);
// The derivative with respect to the activation of the last layer is simple to compute: activation - expectedActivation
var errors = activations.Last().Select((a, i) => a - sample.Output[i]).ToArray();
// Backwards pass
List<double[,]> delCostOverDelWeights = Weights.Select(x => new double[x.GetLength(0), x.GetLength(1)]).ToList();
List<double[]> delCostOverDelActivations = Weights.Select(x => new double[x.GetLength(0)]).ToList();
delCostOverDelActivations[delCostOverDelActivations.Count - 1] = errors;
// Comment notation:
// Cost function: C
// Weight connecting the i-th neuron on the (l + 1)-th layer to the j-th neuron on the l-th layer: w[l][i, j]
// Bias of the i-th neuron on the (l + 1)-th layer: b[l][i]
// Activation of the i-th neuon on the l-th layer: a[l][i]
// Weighted input of the i-th neuron on the l-th layer: z[l][i] // which doesn't make sense on layer 0, but is left for index convenience
// Notice that weights, biases, delCostOverDelWeights and delCostOverDelActivation all start at layer 1 (the 0-th layer is irrelevant to their meanings) while activations and weightedInputs strat at the 0-th layer
for (int l = Weights.Count - 1; l >= 0; l--)
{
//Calculate ∂C/∂w for the current layer:
for (int i = 0; i < Weights[l].GetLength(0); i++)
for (int j = 0; j < Weights[l].GetLength(1); j++)
delCostOverDelWeights[l][i, j] = // ∂C/∂w[l][i, j]
delCostOverDelActivations[l][i] * // ∂C/∂a[l + 1][i]
SigmoidPrime(weightedInputs[l + 1][i]) * // ∂a[l + 1][i]/∂z[l + 1][i] = ∂(σ(z[l + 1][i]))/∂z[l + 1][i] = σ′(z[l + 1][i])
(j < Weights[l].GetLength(1) - 1 ? activations[l][j] : 1); // ∂z[l + 1][i]/∂w[l][i, j] = a[l][j] ||OR|| ∂z[l + 1][i]/∂b[l][i] = 1
// Calculate ∂C/∂a for the previous layer(a[l]):
if (l != 0)
for (int i = 0; i < Weights[l - 1].GetLength(0); i++)
for (int j = 0; j < Weights[l].GetLength(0); j++)
delCostOverDelActivations[l - 1][i] += // ∂C/∂a[l][i] = sum over j:
delCostOverDelActivations[l][j] * // ∂C/∂a[l + 1][j]
SigmoidPrime(weightedInputs[l + 1][j]) * // ∂a[l + 1][j]/∂z[l + 1][j] = ∂(σ(z[l + 1][j]))/∂z[l + 1][j] = σ′(z[l + 1][j])
Weights[l][j, i]; // ∂z[l + 1][j]/∂a[l][i] = w[l][j, i]
}
return delCostOverDelWeights;
}
GetWeightedInputsAndActivations:
public (List<double[]>, List<double[]>) GetWeightedInputsAndActivations(double[] input)
{
List<double[]> activations = new List<double[]>() { input }.Concat(Weights.Select(x => new double[x.GetLength(0)])).ToList();
List<double[]> weightedInputs = activations.Select(x => new double[x.Length]).ToList();
for (int l = 0; l < Weights.Count; l++)
for (int i = 0; i < Weights[l].GetLength(0); i++)
{
double value = 0;
for (int j = 0; j < Weights[l].GetLength(1) - 1; j++)
value += Weights[l][i, j] * activations[l][j];// weights
weightedInputs[l + 1][i] = value + Weights[l][i, Weights[l].GetLength(1) - 1];// bias
activations[l + 1][i] = Sigmoid(weightedInputs[l + 1][i]);
}
return (weightedInputs, activations);
}
The entire NeuralNetwork as well as everything else can be found here.
EDIT: after many significant changes to the repo the above link might no longer be functional, but should hopefully be irrelevant considering the answer. For completeness' sake this is a functional link to the changed repository.
Upvotes: 8
Views: 351
Reputation: 614
Fixed. The issue was: I didn't divide the pixel inputs by 255. Everything else seems to work correctly, and I'm now getting +9000/10000 on the first epoch.
Upvotes: 4