Reputation: 113
I wrote the following backpropagation algorithm to model the two input identity function
clc
% clear
nh = 3; % neurons in hidden layer
ni = 2; % neurons in input layer
eta = .001; % the learning rate
traningSize =100;
for x=1:traningSize
input = rand(traningSize,ni);
test = input;
end
nk = size(test,2); % neurons in output layer
b1 = rand(1,nh);%+ .5;
b2 = rand(1,nk);%- .5;
w1 = rand(nh,ni) + .5;
w2 = rand(nk,nh) - .5;
figure
hold on;
for iter = 1 :5000
errSq = 0;
for x = 1: traningSize
a0 = input(x,:);
ex = test(x,:);
[a1, a2]= feedForward(a0,w1,w2,b1,b2);
del2 = (a2-ex) .* (1-a2) .* (a2);
del1 = (del2 * w2) .* (1-a1) .* (a1);
delB2 = del2;
delB1 = del1;
delW2 = zeros(nk,nh);
for i = 1:nh
for j = 1:nk
delW2 = a1(i) * del2(j);
end
end
for i = 1:ni
for j = 1:nh
delW1 = a0(i) * del1(j);
end
end
b2 = b2 - eta * delB2;
b1 = b1 - eta * delB1;
w2 = w2 - eta * delW2;
w1 = w1 - eta * delW1;
errSq = errSq + sum(a2-ex) .* sum(a2-ex);
end
cost = errSq /(2 * traningSize);
plot(iter,cost,'o');
if cost < 0.005
cost
break
end
end
cost
The feedForward function :
function [a1, a2] = feedForward(a0,w1,w2,b1,b2)
z1 = a0 * w1' + b1;
a1 = sig(z1);
z2 = a1 * w2' + b2;
a2 = sig(z2);
end
The cost function plot
Now what am i messing up ?
Is it some programmatic error that have escaped my notice ? or am I implementing the algorithm wrongly?
when I test the resulting weights the calculated cost is as trained but the results are completely wrong
Blue > expected output ; red > output of neural network
also why does the cost value sometimes rises before decreasing (like in figure 1)
Upvotes: 0
Views: 68
Reputation: 113
As it turn out I needed to correct a programing bug;
delW2 = zeros(nk,nh);
for i = 1:nh
for j = 1:nk
delW2(j,i) = a1(i) * del2(j); % forgot the index for delW2
end
end
delW1 = zeros(nh,ni); % initilize delWI1 (although this is optional)
for i = 1:ni
for j = 1:nh
delW1(j,i) = a0(i) * del1(j); % forgot the index for delW1
end
end
And eliminate the sigmoid function from the output layer. ie make a linear output layer to get acceptable results.
I'm not sure why the linear output layer is strictly required though and would like comments on that.
Upvotes: 1