How to Increase the Scope of Images a Neural Network Can Recognize?

Question

I am working on an image recognition neural network with Pytorch. My goal is to take pictures of handwritten math equations, process them, and use the neural network to recognize each element. I've reached the point where I am able to separate every variable, number, or symbol from the equation, and everything is ready to be sent through the neural network. I've trained my network to recognize numbers quite well (this part was quite easy), but now I want to expand the scope of the neural network to recognizing letters as well as numbers. I loaded handwritten letters along with the numbers into tensors, shuffled the elements, and put them into batches. No matter how I vary my learning rate, my architecture (hidden layers and the number of neurons per layer), or my batch size I cannot get the neural network to recognize letters.

Here is my network architecture and the feed-forward function (you can see I experimented with the number of hidden layers):

class NeuralNetwork(nn.Module):

def __init__(self):

    super().__init__()
    inputNeurons, hiddenNeurons, outputNeurons = 784, 700, 36

    # Create tensors for the weights
    self.layerOne = nn.Linear(inputNeurons, hiddenNeurons)
    self.layerTwo = nn.Linear(hiddenNeurons, hiddenNeurons)
    self.layerThree = nn.Linear(hiddenNeurons, outputNeurons)
    #self.layerFour = nn.Linear(hiddenNeurons, outputNeurons)
    #self.layerFive = nn.Linear(hiddenNeurons, outputNeurons)

# Create function for Forward propagation
def Forward(self, input):

    # Begin Forward propagation
    input = torch.sigmoid(self.layerOne(torch.sigmoid(input)))
    input = torch.sigmoid(self.layerTwo(input))
    input = torch.sigmoid(self.layerThree(input))
    #input = torch.sigmoid(self.layerFour(input))
    #input = torch.sigmoid(self.layerFive(input))

    return input

And this is the training code block (the data is shuffled in a dataloader, the ground truths are shuffled in the same order, batch size is 10, total number letter and number data points is 244800):

neuralNet = NeuralNetwork()
params = list(neuralNet.parameters())
criterion = nn.MSELoss()
print(neuralNet)

dataSet = next(iter(imageDataLoader))
groundTruth = next(iter(groundTruthsDataLoader))

for i in range(15):

    for k in range(24480):

        neuralNet.zero_grad()

        prediction = neuralNet.Forward(dataSet)
        loss = criterion(prediction, groundTruth)
        loss.backward()

        for layer in range(len(params)):

            # Updating the weights of the neural network
            params[layer].data.sub_(params[layer].grad.data * learningRate)

Thanks for the help in advance!

Prajot Kuvalekar · Accepted Answer

First thing i would recommend is writing a clean Pytorch code

For eg.
if i see your NeuralNetwork class it should have forward method (f in lower case), so that you wont call it using prediction = neuralNet.Forward(dataSet). Reason being your hooks from neural network does not get dispatched if you use prediction = neuralNet.Forward(dataSet). For more details refer this link

Second thing is : Since your dataset is not balance.....try to use undersampling / oversampling methods, which will be very helpful in your case.

How to Increase the Scope of Images a Neural Network Can Recognize?

Answers (1)

Related Questions