Reputation: 2079
I know that bias is the same as if 1
would be added to input vectors of each layer or as if it was a neuron with constant output of 1
. The weights going out of the bias neuron are normal weights which are trained during training.
Now I'm studying some codes of neural networks in Tensorflow. E.g. this one (it's just a part of a CNN (VGGnet), specifically the part of CNN where convolution ends and fully connected layers begin):
with tf.name_scope('conv5_3') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(self.conv5_2, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
trainable=True, name='biases')
out = tf.nn.bias_add(conv, biases)
self.conv5_3 = tf.nn.relu(out, name=scope)
self.parameters += [kernel, biases]
# pool5
self.pool5 = tf.nn.max_pool(self.conv5_3,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME',
name='pool4')
with tf.name_scope('fc1') as scope:
shape = int(np.prod(self.pool5.get_shape()[1:]))
fc1w = tf.Variable(tf.truncated_normal([shape, 4096],
dtype=tf.float32,
stddev=1e-1), name='weights')
fc1b = tf.Variable(tf.constant(1.0, shape=[4096], dtype=tf.float32),
trainable=True, name='biases')
pool5_flat = tf.reshape(self.pool5, [-1, shape])
fc1l = tf.nn.bias_add(tf.matmul(pool5_flat, fc1w), fc1b)
self.fc1 = tf.nn.relu(fc1l)
self.parameters += [fc1w, fc1b]
Now my question is, why is bias in convolution layers 0
and it's 1
in fully connected layers (every conv. layer from this model has 0
for bias and FC layers have 1
)? Or does my explanation cover just fully connected layers and it's different with convolutional layers?
Upvotes: 2
Views: 834
Reputation: 53758
Bias (in any layer) is usually initialized with zeros, but random or specific small values are also possible. Quote from Stanford CS231n:
Initializing the biases. It is possible and common to initialize the biases to be zero, since the symmetry breaking is provided by the small random numbers in the weights. For ReLU non-linearities, some people like to use small constant value such as 0.01 for all biases because this ensures that all ReLU units fire in the beginning and therefore obtain and propagate some gradient. However, it is not clear if this provides a consistent improvement (in fact some results seem to indicate that this performs worse) and it is more common to simply use 0 bias initialization.
Other examples: tf.layers.dense
function, which is a short-cut for creating FC layers, uses zero_initializer
by default; and this sample CNN uses random init for all weights and biases and it doesn't hurt the performance.
So, in summary, bias init isn't that important (compared to weight init) and I'm pretty sure you'll get similar training speed with zero or small random init as well.
Upvotes: 1