Reputation:
I am playing with Torch7 these days.
Today, I implemented Perceptron and Multilayer perceptorn (MLP) for solving XOR.
And as expected, MLP works well on XOR and Perceptron is not.
But I was curious what is the result if the number of hidden nodes is one.
I expected the result of MLP might be the same with Perceptorn because it has only 1 hidden node.
But interestingly, MLP was better then Percentron.
More detail, Perceptron get 0.25 error (as expected) but MLP with 1 hidden node get approximately 0.16 error.
I thought that one hidden node acts as one line in a problem space.
So, if there is only one hidden node, it could be the same with Perceptron.
But this result told me I was wrong.
Now, I want to know why MLP with 1 hidden node is better than Perceptron.
Please teach me why this result happened.
Thank you very much.
The following is the Perceptron code:
-- perceptron
require 'nn'
-- data
data = torch.Tensor({ {0, 0}, {0, 1}, {1, 0}, {1, 1} })
-- target
target = torch.Tensor({ 0, 1, 1, 0 })
-- model
perceptron = nn.Linear(2, 1)
-- loss function
criterion = nn.MSECriterion()
-- training
for i = 1, 10000 do
-- set gradients to zero
perceptron:zeroGradParameters()
-- compute output
output = perceptron:forward(data)
-- compute loss
loss = criterion:forward(output, target)
-- compute gradients w.r.t. output
dldo = criterion:backward(output, target)
-- compute gradients w.r.t. parameters
perceptron:backward(data,dldo)
-- gradient descent with learningRate = 0.1
perceptron:updateParameters(0.1)
print(loss)
end
And the following is the MLP with 1 hidden node code:
-- multilayer perceptron
require 'nn'
-- data
data = torch.Tensor({ {0, 0}, {0, 1}, {1, 0}, {1, 1} })
-- target
target = torch.Tensor({ 0, 1, 1, 0 })
-- model
multilayer = nn.Sequential()
inputs = 2; outputs = 1; HUs = 1;
multilayer:add(nn.Linear(inputs, HUs))
multilayer:add(nn.Tanh())
multilayer:add(nn.Linear(HUs, outputs))
-- loss function
criterion = nn.MSECriterion()
-- training
for i = 1, 10000 do
-- set gradients to zero
multilayer:zeroGradParameters()
-- compute output
output = multilayer:forward(data)
-- compute loss
loss = criterion:forward(output, target)
-- compute gradients w.r.t. output
dldo = criterion:backward(output, target)
-- compute gradients w.r.t. parameters
multilayer:backward(data,dldo)
-- gradient descent with learningRate = 0.1
multilayer:updateParameters(0.1)
print(loss)
end
Upvotes: 0
Views: 281
Reputation: 1
This difference in error is probably due to a difference in the learning rate. The number of epochs you used is high enough to find the perfect accuracy. Here's what you should do to fix this: Keep lowering the learning rate in both cases. Turn it to approximately 1e-4.
Upvotes: 0