Reputation: 7683
I am reading a book, and found an error as below:
def relu(x):
return (x>0)*x
def relu2dev(x):
return (x>0)
street_lights = np.array([[1,0,1],[0,1,1],[0,0,1],[1,1,1]])
walk_stop = np.array([[1,1,0,0]]).T
alpha = 0.2
hidden_size = 4
weights_0_1 = 2*np.random.random((3,hidden_size))-1
weights_1_2 = 2*np.random.random((hidden_size,1))-1
for it in range(60):
layer_2_error = 0;
for i in range(len(street_lights)):
layer_0 = street_lights[i:i+1]
layer_1 = relu(np.dot(layer_0,weights_0_1))
layer_2 = np.dot(layer_1,weights_1_2)
layer_2_delta = (layer_2-walk_stop[i:i+1])
# -> layer_2_delta's shape is (1,1), so why np.sum?
layer_2_error += np.sum((layer_2_delta)**2)
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2dev(layer_1)
weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)
if(it % 10 == 9):
print("Error: " + str(layer_2_error))
The error place is commented with # ->
:
layer_2_delta
's shape is (1,1)
, so why would one use np.sum
? I think np.sum
can be removed, but not quite sure, since it comes from a book.
Upvotes: 2
Views: 30
Reputation: 1350
As you say, layer_2_delta
has a shape of (1,1). This means it is a 2 dimensional array with one element: layer_2_delta = np.array([[X]])
. However, layer_2_error
is a scalar. So you can get the scalar from the array by either selecting the value at the first index (layer_2_delta[0,0]
) or by summing all the elements (which in this case is just the one). As the book seems to use "sum of square errors", it seems natural to keep the notation which is square each element in array and then add all of these up (for instruction purposes): this would be more general (e.g., to cases where the layer has more than one element) than the index approach. But you're right, there could be other ways to do this :).
Upvotes: 1