Reputation: 177
TensorFlow Probability layers (e.g. DenseFlipout) have a losses
method (or property) which gets the "losses associated with this layer." Can someone explain what these losses are?
After browsing the Flipout paper, I think the losses refer to the Kullback-Leibler divergence between the prior and posterior distributions of the weight and biases. If someone is more knowledgeable about these things than I am then please correct me.
Upvotes: 4
Views: 622
Reputation: 1383
Your suspicion is correct, albeit poorly documented. For example, in the piece of code below
import tensorflow_probability as tfp
model = tf.keras.Sequential([
tfp.layers.DenseFlipout(512, activation=tf.nn.relu),
tfp.layers.DenseFlipout(10),
])
logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
labels=labels, logits=logits)
kl = sum(model.losses) # Losses are summed
# The negative log-likelihood and the KL term are combined
loss = neg_log_likelihood + kl
train_op = tf.train.AdamOptimizer().minimize(loss)
provided in the documentation of the DenseFlipout
layer, the losses
are summed to get the KL term, and the log-likelihood term is computed separately, and combined with the KL term to form the ELBO.
You can see the loss being added here which, following a few indirections, reveals that the {kernel,bias}_divergence_fn
is being used, and that in turn defaults to a lambda
that calls tfd.kl_divergence(q, p)
.
Upvotes: 2