Reputation: 137
I am trying to use TensorFlow Probability to implement Bayesian Deep Learning for a bioinformatics regression task. The closest analogy in traditional data science would be image scoring where the model attempts to predict label (float value) as close to the true label as possible.
I have previously trained a model on my dataset with normal dense layers in TensorFlow and it does converge and perform decently well on an independent test set. The bayesian network, however, does not seem to converge at all giving a loss of around 10,000 when the dense network eventually converges to a loss of less than 1. I think the error may lie in my implementation of the bayesian neural network. Any help would be much appreciated.
Below are code snippets from the bayesian neural network and the dense network I am trying to replace.
Bayesian Neural Network
kernel_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p)
bias_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p)
interpretation1 = tfp.layers.DenseFlipout(1000,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
kernel_divergence_fn=kernel_divergence_fn,
bias_divergence_fn=bias_divergence_fn,activation="relu")(merged)
interpretation2 = tfp.layers.DenseFlipout(500,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
kernel_divergence_fn=kernel_divergence_fn,
bias_divergence_fn=bias_divergence_fn,activation="relu")(interpretation1)
interpretation3 = tfp.layers.DenseFlipout(200,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
kernel_divergence_fn=kernel_divergence_fn,
bias_divergence_fn=bias_divergence_fn,activation="relu")(interpretation2)
outputs = Dense(1)(interpretation3)
Dense Neural Network
dense1 = Dense(1000, activation="relu")(merged)
dense_drop1 = Dropout(0.35)(dense1)
dense2 = Dense(500, activation="relu")(dense_drop1)
dense_drop2 = Dropout(0.35)(dense2)
dense3 = Dense(200, activation="relu")(dense_drop2)
dense_drop3 = Dropout(0.35)(dense3)
Upvotes: 0
Views: 754
Reputation: 143
Can you include the lines where you compile and fit the data?
High value of loss corresponds to the contribution of Kl-divergence loss. Because you have fixed priors for all the layers, they often tend to get your approximate posteriors near to them and you observe a bad mean fit. One way to minimise the effect of prior is to use type II maximum likelihood estimation (also called Empirical Bayes) where one also updates the prior parameters in the learning stage. See this (blog)for the reference.
Upvotes: 1