Giovanni Crescencio
Giovanni Crescencio

Reputation: 63

How to initialize weights when using RELU activation function

I want to make a Conv network and I wish to use the RELU activation function. Can someone please give me a clue of the correct way to initialize weights (I'm using Theano)

Thanks

Upvotes: 6

Views: 5291

Answers (1)

Daniel Renshaw
Daniel Renshaw

Reputation: 34197

I'm not sure there is a hard and fast best way to initialize weights and bias for a ReLU layer.

Some claim that (a slightly modified version of) Xavier initialization works well with ReLUs. Others that small Gaussian random weights plus bias=1 (ensuring the weighted sum of positive inputs will remain positive and thus not end up in the ReLUs zero region).

In Theano, these can be achieved like this (assuming weights post-multiply the input):

w = theano.shared((numpy.random.randn((in_size, out_size)) * 0.1).astype(theano.config.floatX))
b = theano.shared(numpy.ones(out_size))

or

w = theano.shared((numpy.random.randn((in_size, out_size)) * tt.sqrt(2 / (in_size + out_size))).astype(theano.config.floatX))
b = theano.shared(numpy.zeros(out_size))

Upvotes: 8

Related Questions