Stine
Stine

Reputation: 21

Relu and sigmoid activation function with tfp.layers.DenseVariational Bayesian Neural Net

I am trying to set up a Bayesian Neural Network which is implemented with a statistical layer, tfp.layers.DenseVariational.

I was about to test various activation functions. From my data, tanh or relu should work best.

However, I figured that most Bayesian Neural Nets are using sigmoid as activation function. Does anyone know why?

Moreover, the Bayesian Network is not able to train with relu activation. Is there any theoretical reason why which I am overseeing?

Any help is appreciated!

Upvotes: 1

Views: 106

Answers (1)

Vardan Grigoryants
Vardan Grigoryants

Reputation: 1419

Based on this article, it turns out, that ReLU may fail to get normally distributed weights, due to its plateau nature below zero. Please have a look to the figure below [1]:

The conditional likelihood for the weight from a fully connected
layer from a model trained on 600 samples from MNIST with ReLU and Leaky
ReLU activations

As suggested in the article you can try Leaky ReLU instead.

But, it is not necessarily true that ReLU will always fail to generalize, please have a look to this. In this work, Bayesian Neural Network was trained with ReLU activation function for image regression problem.

Regarding why sigmoid is common in Bayesian Neural Nets. I don't have strict answer. But, probably it may be following: sigmoid is one of the first activation functions, and it is very common to reuse "working" architecture when someone succeeds. May be, in early stages sigmoid has been used, and it is just historical "heritage". But on the other hand, per my quick look, relatively newer articles and public works use ReLU. Another point could be sigmoid ranges from 0 to 1, making it easier to interpret as probability.

Upvotes: 0

Related Questions