Neural network isn't learning for a first few epochs on Keras

I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function

The network isn't learning for first 5-10 epochs, and then everything is fine. I tried using initializers and regularizers, but that only made it worse.

I use the network like this:

import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot


# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()

# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

datagen = ImageDataGenerator(
    rescale=1./255,
    width_shift_range=[-1, 0, 1],
    height_shift_range=[-1, 0, 1],
    rotation_range=10)

epochs = 20
batch_size = 50
num_classes = 10

model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
                 input_shape=x_train.shape[1:],
                 activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
                              activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
                             activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
                             activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
                    verbose=2, shuffle=True)

With the code above I get results like these:

Epoch 1/20 
 - 55s - loss: 2.3098 - accuracy: 0.1036 
Epoch 2/20 
 - 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20 
 - 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20 
 - 56s - loss: 2.3060 - accuracy: 0.1079
...

For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.

But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.

This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?

Upvotes: 1

Views: 984

Answers (1)

ayiyi
ayiyi

Reputation: 127

Activation function suggestion:

In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.

Reasons for not using the sigmoid:

A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.

Upvotes: 3

Related Questions