Reputation: 11
I implemented a MLP with a Backpropagation algorithm, it works fine for only one entry, for example, if the input is 1 and 1 the answers on the last layer will be 1 and 0.
Let's suppose that instead of having only one entry (like 1,1) I have four entries, (1,1; 1,0; 0,0; 0,1), all of them have different expected answers.
I need to train this MLP and it needs to answer correctly to all entries.
I'm not finding a way to do this. Let's suppose that I have 1000 epochs, in this case I would need to train every entry for 250 epochs? Train one epoch with 1 entry then the next epoch with another entry?
How I could properly train a MLP to answer correctly to all entries?
Upvotes: 0
Views: 268
Reputation: 23
at least for a python implementation, you can simply use multidimensional training data
# training a neural network to behave like an XOR gate
import numpy as np
X = np.array([[1,0],[0,1],[1,1],[0,0]]) # entries
y = np.array([[1],[1],[0],[0]]) # expected answers
INPUTS = X.shape[1]
HIDDEN = 12
OUTPUTS = y.shape[1]
w1 = np.random.randn(INPUTS, HIDDEN) * np.sqrt(2 / INPUTS)
w2 = np.random.randn(HIDDEN, OUTPUTS) * np.sqrt(2 / HIDDEN)
ALPHA = 0.5
EPOCHS = 1000
for e in range(EPOCHS):
z1 = sigmoid(X.dot(w1))
o = sigmoid(z1.dot(w2))
o_error = o - y
o_delta = o_error * sigmoidPrime(o)
w2 -= z1.T.dot(o_delta) * ALPHA
w2_error = o_delta.dot(w2.T)
w2_delta = w2_error * sigmoidPrime(z1)
w1 -= X.T.dot(w2_delta) * ALPHA
print(np.mean(np.abs(o_error))) # prints the loss of the NN
such an approach might not work with some neural network libraries, but that shouldn't matter, because neural network libraries will usually handle stuff like that themselves
the reason this works is that during the dot product between the input and hidden layer, each training entry gets matrix-multiplied with the entire hidden layer individually, so the result is a matrix containing the result for each sample forwarded through the hidden layer
and this process continues throughout the entire network, so what you are essentially doing is running multiple instances of the same neural network in parallel
the number of training entries doesn't have to be four, it can be any arbitrarily high number, as long as the size of its contents is the same as the input layer for X and the output layer for y and X and y are the same length (and you have enough RAM)
also, nothing about the neural network architecture is fundamentally changed from using single entries, only the data that is feeded into it has changed, so you don't have to scrap the code you've written, just make a few small changes most likely
Upvotes: 0