Neural network isn't learning for a first few epochs on Keras

Question

I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function

The network isn't learning for first 5-10 epochs, and then everything is fine. I tried using initializers and regularizers, but that only made it worse.

I use the network like this:

import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot


# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()

# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

datagen = ImageDataGenerator(
    rescale=1./255,
    width_shift_range=[-1, 0, 1],
    height_shift_range=[-1, 0, 1],
    rotation_range=10)

epochs = 20
batch_size = 50
num_classes = 10

model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
                 input_shape=x_train.shape[1:],
                 activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
                              activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
                             activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
                             activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
                    verbose=2, shuffle=True)

With the code above I get results like these:

Epoch 1/20 
 - 55s - loss: 2.3098 - accuracy: 0.1036 
Epoch 2/20 
 - 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20 
 - 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20 
 - 56s - loss: 2.3060 - accuracy: 0.1079
...

For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.

But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.

This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?

Neural network isn't learning for a first few epochs on Keras

Answers (1)

Activation function suggestion：

Reasons for not using the sigmoid：

Related Questions

Neural network isn&#39;t learning for a first few epochs on Keras

Answers (1)

Activation function suggestion：

Reasons for not using the sigmoid：

Related Questions

Neural network isn't learning for a first few epochs on Keras