What are the losses associated with the losses property of the Bayesian layers, in TensorFlow Probability?

Question

TensorFlow Probability layers (e.g. DenseFlipout) have a losses method (or property) which gets the "losses associated with this layer." Can someone explain what these losses are?

After browsing the Flipout paper, I think the losses refer to the Kullback-Leibler divergence between the prior and posterior distributions of the weight and biases. If someone is more knowledgeable about these things than I am then please correct me.

Chris Suter · Accepted Answer

Your suspicion is correct, albeit poorly documented. For example, in the piece of code below

import tensorflow_probability as tfp

model = tf.keras.Sequential([
    tfp.layers.DenseFlipout(512, activation=tf.nn.relu),
    tfp.layers.DenseFlipout(10),
])

logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
    labels=labels, logits=logits)

kl = sum(model.losses) # Losses are summed

# The negative log-likelihood and the KL term are combined
loss = neg_log_likelihood + kl 

train_op = tf.train.AdamOptimizer().minimize(loss)

provided in the documentation of the DenseFlipout layer, the losses are summed to get the KL term, and the log-likelihood term is computed separately, and combined with the KL term to form the ELBO.

You can see the loss being added here which, following a few indirections, reveals that the {kernel,bias}_divergence_fn is being used, and that in turn defaults to a lambda that calls tfd.kl_divergence(q, p).

What are the losses associated with the losses property of the Bayesian layers, in TensorFlow Probability?

Answers (1)

Related Questions