Reputation: 13
I just got started with machine learning and I tried to write a simple program where the nn will learn the simple function y = f(x) = 2x.
Here's the code:
#x is a 1D array of 1 to 1000
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[:750]
ytrain = y[:750]
xtest = x[750:]
ytest = y[750:]
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
model.summary()
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
I get the following output, no matter how I change the architecture or the hyperparameters:
79999/79999 [==============================] - 1s 13us/step - loss: 8533120007.8465 - acc: 0.0000e+00 - val_loss: 32532613324.8000 - val_acc: 0.0000e+00
the accuracy is 0 all the time. what am I doing wrong?
Upvotes: 1
Views: 780
Reputation: 11225
It's actually what you would expect if you blindly run and expect gradient descent methods to learn any function. The behaviour you observe stems from 2 reasons:
y = f(wx + b)
, the derivative of y
with respect to w
is f'(wx + b)*x
using the chain rule. So when there is an update for an input that is extremely large / unnormalised it blows up. Now the update is basically w' = w - alpha*gradient
, so the weight suddenly becomes very small, in fact negative.relu
in the final layer, that just outputs 0 and the training stalls because when the output is negative derivative of relu
is 0.You can reduce the datasize to np.arange(1, 10)
and reduce the number of hidden neurons to say 12 (more neurons make the output even more negative after single update as all their weights become negative as well) and you will be able to train the network.
Upvotes: 1
Reputation: 528
I think it works check this out. I used randn
instead of arange
. Other things are pretty much the same.
x = np.random.randn(1000)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.SGD(lr=0.01, decay=1e-6)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
If you want to use the earlier dataset(ie arange
). Here is accompanying code for that.
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.Adam(lr=0.0001)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=200,
verbose=1,
validation_split=0.2)
Upvotes: 0