momo
momo

Reputation: 1122

How to encourage sparsity of a tensor using L1 regularisation

From tensorflow documentation, I see there are a few ways of applying L1 regularisation. The first is the most intuitive to me. This example behaves as expected, d1 has all 3's, which sum up to 48 and scaled by 0.1 we get 4.8 as the loss.

d1 = tf.ones(shape=(2,2,4))*3
regularizer = tf.keras.regularizers.l1(0.1)
regularizer(d1)

tf.Tensor: shape=(), dtype=float32, numpy=4.8

In the second way, we use the regularisation on kernels. So I'm guessing it encourages sparsity of model weights. I can't exactly tell how the loss is 0.54747146.

layer = tf.keras.layers.Dense(3,input_dim=(2,2,4),kernel_regularizer=tf.keras.regularizers.l1(0.1))
out = layer(d1)
layer.losses

tf.Tensor: shape=(), dtype=float32, numpy=0.54747146

The third way I believed should have given the same result as the first way of applying regularisation directly to the layer. Here we use activity_regularizer: Regularizer to apply a penalty on the layer's output.

layer2 = tf.keras.layers.Dense(3,input_dim=(2,2,4),activity_regularizer=tf.keras.regularizers.l1(0.1))
out2=layer2(d1)
layer2.losses

tf.Tensor: shape=(), dtype=float32, numpy=1.4821562

** The value returned by the activity_regularizer is divided by the input batch size...

Why is the loss 1.4821562? It seems to be different every time I repeat. How do the third and first ways differ?

If I want to encourage sparsity of d1, which should I use?

Upvotes: 1

Views: 458

Answers (1)

ngc92
ngc92

Reputation: 91

What your dense layer is calculating is a matrix product y = W x + b. Your three different ways of applying L1 calculate:

  1. l1(x)
  2. l1(W)
  3. l1(Wx + b)

Since the weights and biases are randomly generated, they will be different for each run unless you specify a fixed seed.

Upvotes: 1

Related Questions