Reputation: 1955
I am new to the realm of neural networks and just going through my first actual working sample, using the hand written digits MNIST dataset. I have written a code which as far as I can think should be working (at least to some level), but I cannot figure out what makes it get stuck right after reading the first training sample. My code is the following:
from keras.datasets import mnist
import numpy as np
def relu(x):
return (x > 0) * x
def relu_deriv(x):
return x > 0
(x_train, y_train), (x_test, y_test) = mnist.load_data();
images = x_train[0:1000].reshape(1000, 28*28)
labels = y_train[0:1000]
test_images = x_test[0:1000].reshape(1000, 28*28)
test_labels = y_test[0:1000]
# converting the labels to a matrix
one_hot_labels = np.zeros((len(labels),10))
for i,j in enumerate(labels):
one_hot_labels[i][j] = 1
labels = one_hot_labels
alpha = 0.005
hidden_size = 5 # size of the hidden layer
# initial weight matrixes
w1 = .2 * np.random.random(size=[784, hidden_size]) - .1
w2 = .2 * np.random.random(size=[hidden_size, 10]) - .1
for iteration in range(1000):
error = 0
for i in range(len(images)):
layer_0 = images[i:i+1]
layer_1 = relu(np.dot(layer_0, w1))
layer_2 = np.dot(layer_1, w2)
delta_2 = (labels[i:i+1] - layer_2)
error += np.sum((delta_2) ** 2)
delta_1 = delta_2.dot(w2.T) * relu_deriv(layer_1)
w2 += alpha * np.dot(layer_1.T, delta_2)
w1 += alpha * np.dot(layer_0.T, delta_1)
print("error: {0}".format(error))
What happens is in the first iteration there is obviously a large error, and it gets corrected to 1000 after that, but then no matter how many more iterations, it just gets stuck on that forever.
Upvotes: 1
Views: 71
Reputation: 308
You haven't normalized the image data. The image data has value ranging from 0 to 255. Because of these large values the updates to the weights become large resulting in very large weights after the first iteration. You can normalize the image data as follows.
images = x_train[0:1000].reshape(1000, 28*28)
images = images / 255
labels = y_train[0:1000]
test_images = x_test[0:1000].reshape(1000, 28*28)
test_images = test_images / 255
test_labels = y_test[0:1000]
Upvotes: 1