Reputation: 21
I am working on porting a basic MNIST model training program from using keras 2.3.1 to tf.keras (Tensorflow 2.0) and am seeing some strange behavior.
My initial code trains very well but after updating my imports the model training gets bogged down.
here is my code
#initialize the network
input_layer = Input(shape=(784,))
network1 = Dense(152, activation=nn.tanh)(input_layer)
network2 = Dense(76, activation=nn.tanh)(network1)
network3 = Dense(38, activation=nn.tanh)(network2)
network4 = Dense(4, activation=nn.tanh)(network3)
network5 = Dense(38, activation=nn.tanh)(network4)
network6 = Dense(76, activation=nn.tanh)(network5)
network7 = Dense(152, activation=nn.tanh)(network6)
output = Dense(784, activation=nn.tanh)(network7)
autoencoder = Model(inputs=input_layer, outputs=output)
autoencoder.compile(optimizer='adadelta', loss='MSE')
# Create a callback that saves the model's weights
cp_callback = ModelCheckpoint(filepath=conf.checkpoint_path, save_weights_only=True, verbose=1)
#run the model
autoencoder.fit(x_data_train, x_data_train,
epochs=conf.epochs,
batch_size=conf.batch_size,
shuffle=True,
callbacks=[cp_callback])
my keras imports - old:
from keras.layers import Input, Dense
from keras.models import Model
from keras.callbacks import ModelCheckpoint
my keras imports - new:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint
After the first epoch of training - old- model finishes around loss of 0.025 after 500 epochs
Epoch 1/20 60000/60000 [==============================] - 11s 180us/step - loss: 0.0614
After the first epoch of training - new - model stalls around loss of 0.06 after 500 epochs
Epoch 1/20 60000/60000 [============================>.] - 10s 167us/step - loss: 0.1152
Upvotes: 0
Views: 158
Reputation: 21
After searching the internet for a few hours I did find a small comment here. This comment is the answer
keras adadelta learning rate is 1.0, tf adadelta learning rate is 0.001. So latter learns much slower. You might want to adjust learning rate and test for consistency.
tf.keras adadelta uses a default learning rate of 0.001 while keras uses 1.0. Updating my optimizer learning rate worked!
autoencoder.compile(optimizer=optimizers.Adadelta(learning_rate=1.0), loss='MSE')
Upvotes: 1
Reputation: 952
Have you checked your gradients? Learning NN model is non-deterministic, as you're initializing different set of weights each time. Try at least to use predefined random seed. You're not using any regularization, things can't get out of hand quite a bit.
Upvotes: 0