Reputation: 602
Consider the following model built using the tf.keras
API where I used kernel_regularizer=tf.keras.regularizers.l2(l2)
at the penultimate layer just before the sigmoid layer for binary classification.
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(input_shape=(224, 224, 3), filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=512, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(l2)),
tf.keras.layers.Dense(units=1, activation='sigmoid'),
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
What exactly does the kernel_regularizer
parameter in a tf.keras.layers.Layer
actually implement in terms of the loss function being optimized? Is this just adding the regularization penalty i.e.
to the loss function as is traditionally taught? Is it doing that with regards to all the network's weights or just that layer's?
Upvotes: 0
Views: 5300
Reputation: 56377
Yes, it just adds the regularization penalty to the loss with respect to that layer's weights, you can see that here. This allows you to control which layers are regularized, you can even have different regularization strengths for each layer.
Upvotes: 4