How to encourage sparsity of a tensor using L1 regularisation

Question

From tensorflow documentation, I see there are a few ways of applying L1 regularisation. The first is the most intuitive to me. This example behaves as expected, d1 has all 3's, which sum up to 48 and scaled by 0.1 we get 4.8 as the loss.

d1 = tf.ones(shape=(2,2,4))*3
regularizer = tf.keras.regularizers.l1(0.1)
regularizer(d1)

tf.Tensor: shape=(), dtype=float32, numpy=4.8

In the second way, we use the regularisation on kernels. So I'm guessing it encourages sparsity of model weights. I can't exactly tell how the loss is 0.54747146.

layer = tf.keras.layers.Dense(3,input_dim=(2,2,4),kernel_regularizer=tf.keras.regularizers.l1(0.1))
out = layer(d1)
layer.losses

tf.Tensor: shape=(), dtype=float32, numpy=0.54747146

The third way I believed should have given the same result as the first way of applying regularisation directly to the layer. Here we use activity_regularizer: Regularizer to apply a penalty on the layer's output.

layer2 = tf.keras.layers.Dense(3,input_dim=(2,2,4),activity_regularizer=tf.keras.regularizers.l1(0.1))
out2=layer2(d1)
layer2.losses

tf.Tensor: shape=(), dtype=float32, numpy=1.4821562

** The value returned by the activity_regularizer is divided by the input batch size...

Why is the loss 1.4821562? It seems to be different every time I repeat. How do the third and first ways differ?

If I want to encourage sparsity of d1, which should I use?

How to encourage sparsity of a tensor using L1 regularisation

Answers (1)

Related Questions