Farhan Rabbaanii
Farhan Rabbaanii

Reputation: 463

validation loss sometimes spiking

i want to detect which one is genuine image and which one is spoof image. and i have +- 8000 dataset images (combine). so i trained the model with LR = 1e-4 BS = 32 EPOCHS = 100. and this is the result. sometimes, my val loss is spiking but after that it come back below the train loss line. what happen with my model ? any answer would be appreciated. thanks in advance !

training result graph

        model = Sequential()
        inputShape = (height, width, depth)
        chanDim = -1

        # if we are using "channels first", update the input shape
        # and channels dimension
        if K.image_data_format() == "channels_first":
            inputShape = (depth, height, width)
            chanDim = 1

        model.add(Conv2D(16, (3, 3), padding="same",
            input_shape=inputShape))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(16, (1, 1), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # second CONV => RELU => CONV => RELU => POOL layer set
        model.add(Conv2D(32, (1, 1), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(32, (1, 1), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Flatten())
        model.add(Dense(64))
        model.add(Activation("relu"))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))
        # softmax classifier
        model.add(Dense(classes))
        model.add(Activation("softmax"))

Upvotes: 4

Views: 4730

Answers (1)

Pasa
Pasa

Reputation: 722

Imagine the loss function as a surface, with as many dimensions as there are free parameters in your model. Each point in this surface corresponds to a set of parameter values, and is associated with a loss value (which you are trying to minimize). I assume you are training this CNN model with some sort of gradient descent / backpropagation algorithm (all Keras optimizers fall into this category).

In this setup, the gradient estimations will invariably be noisy, since your training data is not a full sample of the entire sample space (does not contain all possible input values, which would otherwise be intractable for real world problems) and may not have a distribution that matches exactly the distribution of the validation set. You're calculating an estimate of the gradient based on an incomplete finite sample of a (possibly infinite) universe. Therefore, each step is not going to be exactly in the direction to minimize the true loss function, but hopefully will be close enough to make the model converge to a useful solution. Even if you could somehow calculate the exact gradient, some algorithms will, by design, not follow this direction exactly (for instance, ones that use momentum). Also, even following the exact gradient direction may lead to increases in the loss value, due to overshoot (especially with larger learning rates).

Using mini-batches (the batch size is chosen in the call to model.fit(), in your case, 32) will also introduce some additional noise, as the gradients for each weight update iteration will not be calculated on all available training data, but only a limited subset (batch). This extra noise is a small price to pay given the considerable speedups mini-batch yields, leading to faster convergence.

In fact, some noise is actually desirable, as it may help the optimizer get away from local minima, as shown below (toy example):

Perfect gradient descent vs. Noisy gradient descent

So, to answer your question, it is perfectly normal to have spikes as the ones you are seeing during training, be it on the validation or training losses, due to the reasons stated above and the fact that both validation and training sets are finite and incomplete samples of the sample space. I suspect the validation metrics may be even more noisy since the validation set is usually a much smaller sample than the training set and is not the target for optimization (training/validation distributions may not overlap completely).

Upvotes: 5

Related Questions