Young Avassalador
Young Avassalador

Reputation: 11

Neural Network - Train a MLP with multiple entries

I implemented a MLP with a Backpropagation algorithm, it works fine for only one entry, for example, if the input is 1 and 1 the answers on the last layer will be 1 and 0.

Let's suppose that instead of having only one entry (like 1,1) I have four entries, (1,1; 1,0; 0,0; 0,1), all of them have different expected answers.

I need to train this MLP and it needs to answer correctly to all entries.

I'm not finding a way to do this. Let's suppose that I have 1000 epochs, in this case I would need to train every entry for 250 epochs? Train one epoch with 1 entry then the next epoch with another entry?

How I could properly train a MLP to answer correctly to all entries?

Upvotes: 0

Views: 268

Answers (1)

Sebastian
Sebastian

Reputation: 23

at least for a python implementation, you can simply use multidimensional training data

# training a neural network to behave like an XOR gate
import numpy as np

X = np.array([[1,0],[0,1],[1,1],[0,0]]) # entries
y = np.array([[1],[1],[0],[0]]) # expected answers

INPUTS = X.shape[1]
HIDDEN = 12
OUTPUTS = y.shape[1]

w1 = np.random.randn(INPUTS, HIDDEN) * np.sqrt(2 / INPUTS)
w2 = np.random.randn(HIDDEN, OUTPUTS) * np.sqrt(2 / HIDDEN)

ALPHA = 0.5
EPOCHS = 1000

for e in range(EPOCHS):
    z1 = sigmoid(X.dot(w1))
    o = sigmoid(z1.dot(w2))
    
    o_error = o - y
    o_delta = o_error * sigmoidPrime(o)
    
    w2 -= z1.T.dot(o_delta) * ALPHA
    w2_error = o_delta.dot(w2.T)
    w2_delta = w2_error * sigmoidPrime(z1)
    
    w1 -= X.T.dot(w2_delta) * ALPHA
    print(np.mean(np.abs(o_error))) # prints the loss of the NN

such an approach might not work with some neural network libraries, but that shouldn't matter, because neural network libraries will usually handle stuff like that themselves

the reason this works is that during the dot product between the input and hidden layer, each training entry gets matrix-multiplied with the entire hidden layer individually, so the result is a matrix containing the result for each sample forwarded through the hidden layer

and this process continues throughout the entire network, so what you are essentially doing is running multiple instances of the same neural network in parallel

the number of training entries doesn't have to be four, it can be any arbitrarily high number, as long as the size of its contents is the same as the input layer for X and the output layer for y and X and y are the same length (and you have enough RAM)

also, nothing about the neural network architecture is fundamentally changed from using single entries, only the data that is feeded into it has changed, so you don't have to scrap the code you've written, just make a few small changes most likely

Upvotes: 0

Related Questions