Random weight initialisation influence on a simple neural network

Question

I am following a book which has the following code:


import numpy as np

np.random.seed(1)

streetlights = np.array([[1, 0, 1], [0, 1, 1], [0, 0, 1], [1, 1, 1]])

walk_vs_stop = np.array([[1, 1, 0, 0]]).T


def relu(x):
    return (x > 0) * x


def relu2deriv(output):
    return output > 0


alpha = 0.2
hidden_layer_size = 4

# random weights from the first layer to the second
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
# random weights from the second layer to the output
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1


for iteration in range(60):
    layer_2_error = 0
    for i in range(len(streetlights)):
        layer_0 = streetlights[i : i + 1]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = relu(np.dot(layer_1, weights_1_2))

        layer_2_error += np.sum((layer_2 - walk_vs_stop[i : i + 1])) ** 2

        layer_2_delta = layer_2 - walk_vs_stop[i : i + 1]
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)

        weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)

    if iteration % 10 == 9:
        print(f"Error: {layer_2_error}")

Which outputs:

# Error: 0.6342311598444467
# Error: 0.35838407676317513
# Error: 0.0830183113303298
# Error: 0.006467054957103705
# Error: 0.0003292669000750734
# Error: 1.5055622665134859e-05

I understand everything but this part is not explained and I am not sure why it is the way it is:

weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1

I don't understand:

Why there is 2* the whole matrix and why is there a -1
If I change 2 to 3 my error becomes greatly lower # Error: 5.616513576418916e-13
I tried changing the 2 to many other numbers along with the change of -1 to many other numbers I get # Error: 2.0 most of the time or the Error is much worst than combination of 3 and -1.

I can't seem to grasp the relationship and the purpose of multiplying the random weights by a number and subracting a number afterwards.

P.S. The idea of the network is to understand a streetlight pattern when people should go and when they should stop depending what combination of the lights in streetlight is on / off.

CoMartel · Accepted Answer

There is a lot of ways to initialize neural network, and it's a current research subject as it can have a great impact on performance and training time. Some rules of thumb :

avoid having only one value for all weights, as they would all update the same
avoid having too large weights that could make your gradient too high
avoid having too small weights that could make your gradient vanish

In your case, the goal is just to have something between [-1;1] :

np.random.random gives you a float in [0;1]
multiply by 2 gives you something in [0;2]
substract 1 gives you a number in [-1;1]

Random weight initialisation influence on a simple neural network

Answers (2)

Related Questions