jorgenkg
jorgenkg

Reputation: 4275

MLP Neural Network: calculating the gradient (matrices)

What is a good implementation for calculating the gradient in a n-layered neural network?

Weight layers:

  1. First layer weights:     (n_inputs+1, n_units_layer)-matrix
  2. Hidden layer weights: (n_units_layer+1, n_units_layer)-matrix
  3. Last layer weights:     (n_units_layer+1, n_outputs)-matrix

Notes:


A bit vague pseudocode:

    weight_layers = [ layer1, layer2 ]             # a list of layers as described above
    input_values  = [ [0,0], [0,0], [1,0], [0,1] ] # our test set (corresponds to XOR)
    target_output = [ 0, 0, 1, 1 ]                 # what we want to train our net to output
    output_layers = []                             # output for the corresponding layers

    for layer in weight_layers:
        output <-- calculate the output     # calculate the output from the current layer
        output_layers <-- output            # store the output from each layer
    
    n_samples = input_values.shape[0]
    n_outputs = target_output.shape[1]
    
    error = ( output-target_output )/( n_samples*n_outputs )

    """ calculate the gradient here """

Final implementation

The final implementation is available at github.

Upvotes: 2

Views: 4554

Answers (1)

alfa
alfa

Reputation: 3088

With Python and numpy that is easy.

You have two options:

  1. You can either compute everything in parallel for num_instances instances or
  2. you can compute the gradient for one instance (which is actually a special case of 1.).

I will now give some hints how to implement option 1. I would suggest that you create a new class that is called Layer. It should have two functions:

forward:
    inputs:
    X: shape = [num_instances, num_inputs]
        inputs
    W: shape = [num_outputs, num_inputs]
        weights
    b: shape = [num_outputs]
        biases
    g: function
        activation function
    outputs:
    Y: shape = [num_instances, num_outputs]
        outputs


backprop:
    inputs:
    dE/dY: shape = [num_instances, num_outputs]
        backpropagated gradient
    W: shape = [num_outputs, num_inputs]
        weights
    b: shape = [num_outputs]
        biases
    gd: function
        calculates the derivative of g(A) = Y
        based on Y, i.e. gd(Y) = g'(A)
    Y: shape = [num_instances, num_outputs]
        outputs
    X: shape = [num_instances, num_inputs]
        inputs
    outputs:
    dE/dX: shape = [num_instances, num_inputs]
        will be backpropagated (dE/dY of lower layer)
    dE/dW: shape = [num_outputs, num_inputs]
        accumulated derivative with respect to weights
    dE/db: shape = [num_outputs]
        accumulated derivative with respect to biases

The implementation is simple:

def forward(X, W, b):
    A = X.dot(W.T) + b # will be broadcasted
    Y = g(A)
    return Y

def backprop(dEdY, W, b, gd, Y, X):
    Deltas = gd(Y) * dEdY # element-wise multiplication
    dEdX = Deltas.dot(W)
    dEdW = Deltas.T.dot(X)
    dEdb = Deltas.sum(axis=0)
    return dEdX, dEdW, dEdb

X of the first layer is your taken from your dataset and then you pass each Y as the X of the next layer in the forward pass.

The dE/dY of the output layer is computed (either for softmax activation function and cross entropy error function or for linear activation function and sum of squared errors) as Y-T, where Y is the output of the network (shape = [num_instances, num_outputs]) and T (shape = [num_instances, num_outputs]) is the desired output. Then you can backpropagate, i.e. dE/dX of each layer is dE/dY of the previous layer.

Now you can use dE/dW and dE/db of each layer to update W and b.

Here is an example for C++: OpenANN.

Btw. you can compare the speed of instance-wise and batch-wise forward propagation:

In [1]: import timeit

In [2]: setup = """import numpy
   ...: W = numpy.random.rand(10, 5000)
   ...: X = numpy.random.rand(1000, 5000)"""

In [3]: timeit.timeit('[W.dot(x) for x in X]', setup=setup, number=10)
Out[3]: 0.5420958995819092

In [4]: timeit.timeit('X.dot(W.T)', setup=setup, number=10)
Out[4]: 0.22001314163208008

Upvotes: 2

Related Questions