Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Random weight initialisation influence on a simple neural network

I am following a book which has the following code:


import numpy as np

np.random.seed(1)

streetlights = np.array([[1, 0, 1], [0, 1, 1], [0, 0, 1], [1, 1, 1]])

walk_vs_stop = np.array([[1, 1, 0, 0]]).T


def relu(x):
    return (x > 0) * x


def relu2deriv(output):
    return output > 0


alpha = 0.2
hidden_layer_size = 4

# random weights from the first layer to the second
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
# random weights from the second layer to the output
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1


for iteration in range(60):
    layer_2_error = 0
    for i in range(len(streetlights)):
        layer_0 = streetlights[i : i + 1]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = relu(np.dot(layer_1, weights_1_2))

        layer_2_error += np.sum((layer_2 - walk_vs_stop[i : i + 1])) ** 2

        layer_2_delta = layer_2 - walk_vs_stop[i : i + 1]
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)

        weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)

    if iteration % 10 == 9:
        print(f"Error: {layer_2_error}")

Which outputs:

# Error: 0.6342311598444467
# Error: 0.35838407676317513
# Error: 0.0830183113303298
# Error: 0.006467054957103705
# Error: 0.0003292669000750734
# Error: 1.5055622665134859e-05

I understand everything but this part is not explained and I am not sure why it is the way it is:

weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1

I don't understand:

  1. Why there is 2* the whole matrix and why is there a -1
  2. If I change 2 to 3 my error becomes greatly lower # Error: 5.616513576418916e-13
  3. I tried changing the 2 to many other numbers along with the change of -1 to many other numbers I get # Error: 2.0 most of the time or the Error is much worst than combination of 3 and -1.

I can't seem to grasp the relationship and the purpose of multiplying the random weights by a number and subracting a number afterwards.

P.S. The idea of the network is to understand a streetlight pattern when people should go and when they should stop depending what combination of the lights in streetlight is on / off.

Upvotes: 0

Views: 565

Answers (2)

mujjiga
mujjiga

Reputation: 16916

2*np.random.random((3, 4)) -1 is a way to generated 3*4=12 random number from uniform distribution of half-open interval [-1, +1) i.e including -1 but excluding +1.

This is equivalent to more readable code

np.random.uniform(-1, 1, (3, 4))

Upvotes: 0

CoMartel
CoMartel

Reputation: 3591

There is a lot of ways to initialize neural network, and it's a current research subject as it can have a great impact on performance and training time. Some rules of thumb :

  • avoid having only one value for all weights, as they would all update the same
  • avoid having too large weights that could make your gradient too high
  • avoid having too small weights that could make your gradient vanish

In your case, the goal is just to have something between [-1;1] :

  1. np.random.random gives you a float in [0;1]
  2. multiply by 2 gives you something in [0;2]
  3. substract 1 gives you a number in [-1;1]

Upvotes: 2

Related Questions