David Lerner
David Lerner

Reputation: 344

Error in backpropagation python neural net

Darn thing just won't learn. Sometimes weights seem to become nan.

I haven't played with different numbers of hidden layers/inputs/outputs but the bug appears consistent across different sizes of hidden layer.

from __future__ import division
import numpy
import matplotlib
import random

class Net:
    def __init__(self, *sizes):
        sizes = list(sizes)
        sizes[0] += 1
        self.sizes = sizes
        self.weights = [numpy.random.uniform(-1, 1, (sizes[i+1],sizes[i])) for i in range(len(sizes)-1)]

    @staticmethod
    def activate(x):    
        return 1/(1+numpy.exp(-x))


    def y(self, x_):
        x = numpy.concatenate(([1], numpy.atleast_1d(x_.copy())))
        o = [x] #o[i] is the (activated) output of hidden layer i, "hidden layer 0" is inputs
        for weight in self.weights[:-1]:
            x = weight.dot(x)
            x = Net.activate(x)
            o.append(x)
        o.append(self.weights[-1].dot(x))
        return o    

    def __call__(self, x):
        return self.y(x)[-1]

    def delta(self, x, t):
        o = self.y(x)
        delta = [(o[-1]-t) * o[-1] * (1-o[-1])]
        for i, weight in enumerate(reversed(self.weights)):
            delta.append(weight.T.dot(delta[-1]) * o[-i-2] * (1-o[-i-2]))
        delta.reverse()
        return o, delta            

    def train(self, inputs, outputs, epochs=100, rate=.1):
        for epoch in range(epochs):
            pairs = zip(inputs, outputs)
            random.shuffle(pairs)
            for x, t in pairs: #shuffle? subset? 
                o, d = self.delta(x, t)
                for layer in range(len(self.sizes)-1):
                    self.weights[layer] -=  rate * numpy.outer(o[layer+1], d[layer])


n = Net(1, 4, 1)
x = numpy.linspace(0, 2*3.14, 10)
t = numpy.sin(x)
matplotlib.pyplot.plot(x, t, 'g')
matplotlib.pyplot.plot(x, map(n, x), 'r')
n.train(x, t)
print n.weights
matplotlib.pyplot.plot(x, map(n, x), 'b')
matplotlib.pyplot.show()

Upvotes: 0

Views: 1172

Answers (2)

David Lerner
David Lerner

Reputation: 344

I fixed it! Thanks for all the suggestions. I worked out numeric partials and found that my o and deltas were correct, but I was multiplying the wrong ones. That's why I now take numpy.outer(d[layer+1], o[layer]) instead of numpy.outer(d[layer], o[layer+1]).

I was also skipping the update on one layer. That's why I changed for layer in range(self.hidden_layers) to for layer in range(self.hidden_layers+1).

I'll add that I caught a bug just before posting originally. My output layer delta was incorrect because my net (intentionally) doesn't activate the final outputs, but my delta was computed as though it did.

Debugged primarily with a one hidden layer, one hidden unit net, then moved to a 2 input, 3 hidden layers of 2 neurons each, 2 output model.

from __future__ import division
import numpy
import scipy
import scipy.special
import matplotlib
#from pylab import *

#numpy.random.seed(23)

def nmap(f, x):
    return numpy.array(map(f, x))

class Net:
    def __init__(self, *sizes):
        self.hidden_layers = len(sizes)-2
        self.weights = [numpy.random.uniform(-1, 1, (sizes[i+1],sizes[i])) for i in range(self.hidden_layers+1)]

    @staticmethod
    def activate(x):
        return scipy.special.expit(x)
        #return 1/(1+numpy.exp(-x))

    @staticmethod
    def activate_(x):
        s = scipy.special.expit(x)
        return s*(1-s)

    def y(self, x):
        o = [numpy.array(x)] #o[i] is the (activated) output of hidden layer i, "hidden layer 0" is inputs and not activated
        for weight in self.weights[:-1]:
            o.append(Net.activate(weight.dot(o[-1])))
        o.append(self.weights[-1].dot(o[-1]))
#        for weight in self.weights:
#            o.append(Net.activate(weight.dot(o[-1])))
        return o

    def __call__(self, x):
        return self.y(x)[-1]

    def delta(self, x, t):
        x = numpy.array(x)
        t = numpy.array(t)
        o = self.y(x)
        #delta = [(o[-1]-t) * o[-1] * (1-o[-1])]
        delta = [o[-1]-t]
        for i, weight in enumerate(reversed(self.weights)):
            delta.append(weight.T.dot(delta[-1]) * o[-i-2] * (1-o[-i-2]))
        delta.reverse() #surely i need this
        return o, delta

    def train(self, inputs, outputs, epochs=1000, rate=.1):
        errors = []
        for epoch in range(epochs):
            for x, t in zip(inputs, outputs): #shuffle? subset?
                o, d = self.delta(x, t)
                for layer in range(self.hidden_layers+1):
                    grad = numpy.outer(d[layer+1], o[layer])
                    self.weights[layer] -=  rate * grad

        return errors

    def rmse(self, inputs, outputs):
        return ((outputs - nmap(self, inputs))**2).sum()**.5/len(inputs)



n = Net(1, 8, 1)
X = numpy.linspace(0, 2*3.1415, 10)
T = numpy.sin(X)
Y = map(n, X)
Y = numpy.array([y[0,0] for y in Y])
matplotlib.pyplot.plot(X, T, 'g')
matplotlib.pyplot.plot(X, Y, 'r')
print 'output successful'
print n.rmse(X, T)
errors = n.train(X, T)
print 'tried to train successfully'
print n.rmse(X, T)
Y = map(n, X)
Y = numpy.array([y[0,0] for y in Y])
matplotlib.pyplot.plot(x, Y, 'b')
matplotlib.pyplot.show()

Upvotes: 1

user2489252
user2489252

Reputation:

I haven't looked for a particular bug in your code, but can you please try the following things to narrow down your problem further? Otherwise it is very tedious to find the needle in the haystack.

1) Please try to use a real dataset to have an idea what to expect, e.g., MNIST, and/or standardize your data, because your weights may become NaN if they become too small.

2) Try different learning rates and plot the cost function vs. epochs to check if you are converging. It should look somewhat like this (note that I used minibatch learning and averaged the minibatch chunks for each epoch).

enter image description here

3) I see that you are using a sigmoid activation, your implementation is correct, but to make it numerically more stable, replace 1.0 / (1.0 + np.exp(-z)) by expit(z) from scipy.special (same function but more efficient).

4) Implement gradient checking. Here, you compare the analytical solution to a numerically approximated gradient

enter image description here

enter image description here

Or an even better approach that yields a more accurate approximation of the gradient is to compute the symmetric (or centered) difference quotient given by the two-point formula

enter image description here

PS: If you are interested and find it useful, I have a working vanilla NumPy neural net implemented here.

Upvotes: 3

Related Questions