
Reputation: 10389

Why is the loss of my autoencoder not going down at all during training?

I am following this tutorial to create a Keras-based autoencoder, but using my own data. That dataset includes about 20k training and about 4k validation images. All of them are very similar, all show the very same object. I haven't modified the Keras model layout from the tutorial, only changed the input size, since I used 300x300 images. So my model looks like this:

Model: "autoencoder"
Layer (type)                 Output Shape              Param #
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
encoder (Functional)         (None, 16)                5779216
decoder (Functional)         (None, 300, 300, 1)       6176065
Total params: 11,955,281
Trainable params: 11,954,897
Non-trainable params: 384
Model: "encoder"
Layer (type)                 Output Shape              Param #
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
conv2d (Conv2D)              (None, 150, 150, 32)      320
leaky_re_lu (LeakyReLU)      (None, 150, 150, 32)      0
batch_normalization (BatchNo (None, 150, 150, 32)      128
conv2d_1 (Conv2D)            (None, 75, 75, 64)        18496
leaky_re_lu_1 (LeakyReLU)    (None, 75, 75, 64)        0
batch_normalization_1 (Batch (None, 75, 75, 64)        256
flatten (Flatten)            (None, 360000)            0
dense (Dense)                (None, 16)                5760016
Total params: 5,779,216
Trainable params: 5,779,024
Non-trainable params: 192
Model: "decoder"
Layer (type)                 Output Shape              Param #
input_2 (InputLayer)         [(None, 16)]              0
dense_1 (Dense)              (None, 360000)            6120000
reshape (Reshape)            (None, 75, 75, 64)        0
conv2d_transpose (Conv2DTran (None, 150, 150, 64)      36928
leaky_re_lu_2 (LeakyReLU)    (None, 150, 150, 64)      0
batch_normalization_2 (Batch (None, 150, 150, 64)      256
conv2d_transpose_1 (Conv2DTr (None, 300, 300, 32)      18464
leaky_re_lu_3 (LeakyReLU)    (None, 300, 300, 32)      0
batch_normalization_3 (Batch (None, 300, 300, 32)      128
conv2d_transpose_2 (Conv2DTr (None, 300, 300, 1)       289
activation (Activation)      (None, 300, 300, 1)       0
Total params: 6,176,065
Trainable params: 6,175,873
Non-trainable params: 192

Then I initialize my model like this:

LR = 0.0001

(encoder, decoder, autoencoder) =, IMGSIZE, 1)
sched = ExponentialDecay(initial_learning_rate=LR, decay_steps=EPOCHS, decay_rate=LR / EPOCHS)
autoencoder.compile(loss="mean_squared_error", optimizer=Adam(learning_rate=sched))

Then I train my model like this:

image_generator = ImageDataGenerator(rescale=1.0 / 255)
train_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "training"),
    target_size=(IMGSIZE, IMGSIZE),
val_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "validation"),
    target_size=(IMGSIZE, IMGSIZE),
hist =, validation_data=val_gen, epochs=EPOCHS, batch_size=BS)

My batch size BS is 32 and I start with an initial Adam learning rate of 0.001 (but I also tried values like 0.1 down to 0.0001). I also tried to increase the latent dimensionality to something like 1024, but that doesn't solve my issue either.

Now during training the loss goes down in the first epoch from about 0.5 to about 0.2 - and then beginning from the second epoch that loss sticks at the very same value, e.g. 0.1989, and then it stays there "forever", regardless of how many epochs I train and/or the initial learning rate I use.

Any ideas what could be the problem here?

Upvotes: 0

Views: 1150

Answers (1)


Reputation: 17603

It could be that the decay_rate argument in tf.keras.optimizers.schedules.ExponentialDecay is decaying your learning rate quicker than you think it is, effectively making your learning rate zero.

Upvotes: 0

Related Questions