Arun
Arun

Reputation: 2478

TensorFlow SGD decay parameter

I am using TensorFlow 2.4.1 and Python3.8 for Computer Vision based CNN models such as VGG-18, ResNet-18/34, etc. My question is specific to weight decay declaration. There are two ways of defining it:

  1. The first is by declaring it for each layer using 'kernel_regularizer' parameter for 'Conv2D' layer
  2. The second is by using 'decay' parameter in TF SGD optimizer

Example codes are:

weight_decay = 0.0005

Conv2D(
    filters = 64, kernel_size = (3, 3),
    activation='relu', kernel_initializer = tf.initializers.he_normal(),
    strides = (1, 1), padding = 'same',
    kernel_regularizer = regularizers.l2(weight_decay),
)
# NOTE: this 'kernel_regularizer' parameter is used for all of the conv layers in ResNet-18/34 and VGG-18 models

optimizer = tf.keras.optimizers.SGD(learning_rate = 0.01, decay = lr_decay, momentum = 0.9)

My question is:

  1. Are these two techniques for using weight decay doing the same thing? If yes, only one should be used to avoid redundancy
  2. If not, does using both of these weight decay definitions add twice the weight decay? Because too much of regularization would push even the helpful weights towards zero and therefore in essence, any model will not learn the desired function.

Upvotes: 3

Views: 3946

Answers (1)

David Thery
David Thery

Reputation: 719

Decay argument has been deprecated for all optimizers since Keras 2.3. For learning rate decay, you should use LearningRateSchedule instead.

As for your questions:

  1. Partially agreed; if you have a deep neural network, it would be possible to apply a more important decay only on "surface" layers, while having a smoother overall decay using LearningRateSchedule. This is probably about experimentation, but in general I would agree with the rule of reducing complexity.
  2. Again, I would say it depends on the amount of layers; if you don't have a single layer, I don't see why both weight decay would be added as they act on different scales of your network.

Why not running the different configurations to compare?

Upvotes: 3

Related Questions