Reputation: 53916
Here is a simple keras neural network that attempts to map 1->1 and 2->0 (binary classification)
X = [[1] , [2]]
Y = [[1] , [0]]
from keras.callbacks import History
history = History()
from keras import optimizers
inputDim = len(X[0])
print('input dim' , inputDim)
model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=inputDim))
model.add(Dense(1, activation='sigmoid'))
sgd = optimizers.SGD(lr=0.009, decay=1e-10, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd , metrics=['accuracy'])
model.fit(X,Y , validation_split=0.1 , verbose=2 , callbacks=[history] , epochs=20,batch_size=32)
Using SGD optimizer :
optimizers.SGD(lr=0.009, decay=1e-10, momentum=0.9, nesterov=True)
Output for epoch 20 :
Epoch 20/20
0s - loss: 0.5973 - acc: 1.0000 - val_loss: 0.4559 - val_acc: 0.0000e+00
If I use the adam optomizer :
sgd = optimizers.adam(lr=0.009, decay=1e-10)
Output for epoch 20 :
Epoch 20/20
0s - loss: 1.2140 - acc: 0.0000e+00 - val_loss: 0.2930 - val_acc: 1.0000
Switching between adam and sgd optimizers appears to reverse values for acc and val_acc . val_acc = 1 using adam but as acc is 0 , how can validation accuracy be at maximum and training accuracy be at minimum ?
Upvotes: 1
Views: 350
Reputation: 40516
Using sigmoid
after sigmoid
is a really bad idea. E.g. in this paper it's written why sigmoid
suffers from a so-called saturation problem. Moreover - when you use sigmoid
after sigmoid
you push the overall saturation of your network to by sky-rocketing in fact. To understand why - notice that the output from a first layer is always from an interval (0, 1)
. As binary_crossentropy
tries to make this output (transformed as linear transformation) as close to +/- inf
as possible this makes your network to have extremely high weights. This is probably causing your total instability.
In order to solve your problem, I would simply leave only one layer with sigmoid
as your problem has a linear separation property.
UPDATE: As @Daniel mentioned - when you split your dataset containing two examples you end-up having one example in a dataset and other in a validation set. This is causing this weird behavior.
Upvotes: 1