RexC
RexC

Reputation: 1

Regularization in Neural Networks

If activation function like Relu sets the value of the nodes to zero, so is it necessary to use dropout in the same neural network, dropout too randomly knocks out the nodes in a neural network, so is it beneficial to use both relu and dropout together in a neural network?

Upvotes: 0

Views: 252

Answers (2)

Prune
Prune

Reputation: 77847

"Is it beneficial ..." is a question you really need to ask your model, not us. DL modeling is still an art -- in other words, intelligent trial and error. There is no universal answer for NNs. However, learning a bit about their effects can help you tune your own research.

An anthropomorphic view can help you generalize a very broad understanding of their operation within a large NN; here are my working interpretations.

ReLU is a simple tuning of attenuation for a kernel-in-training. Each matrix value is a measure of interest from the kernel's viewpoint: "How excited am I about this matrix element?" ReLU is a rule that helps focus the next layer. It says "if this position is boring, I don't care how boring it is. Don't waste time adjusting your snarling level; ignore it." All such values are set to 0, removing them from influence at succeeding layers. Further training depends only on positive identification of intermediate features.

Dropout is a different philosophy. It helps to protect the model against false intermediate conclusions. It says "let's take a fresh look at some of these things; forget some of what you learned and start over." The generic concept is that if something is "true learning", then it is supported by the input and/or remaining learning; we will quickly re-learn those weights. If it was an aberration of the input shuffling or noise in the data, then it's unlikely to reappear, and the erased weights will be put to a better purpose.

Upvotes: 1

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

While both methods set some nodes to 0, dropout will do so randomly and relu will do so based on the input and are therefore completely different in their usage. Dropout is used to reduce the likelihood of the network to predict based on a rigid structure of neurons, e.g. it should include as many neurons as possible in its decision process. This makes it more robust to noise and therefore generalize better. ReLU is just a simple activation function, that happens to work well in practice to train large networks.

So to conclude, yes it makes sense to use them together and it might be able to reduce overfitting.

Upvotes: 1

Related Questions