Reputation: 2307
I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py
His convolution layer is specifically defined as:
def conv_layer(input, size_in, size_out, name="conv"):
with tf.name_scope(name):
w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
act = tf.nn.relu(conv + b)
tf.summary.histogram("weights", w)
tf.summary.histogram("biases", b)
tf.summary.histogram("activations", act)
return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
I am trying to work out what is the effect of the conv2d
on the input tensor size. As far as I can tell it seems the first 3 dimensions are unchanged but the last dimension of the output follows the size of the last dimension of w
.
For example, ?x47x36x64 input becomes ?x47x36x128 with w shape=5x5x64x128
And I also see that: ?x24x18x128 becomes ?x24x18x256 with w shape=5x5x128x256
So, is the resultant size for input: [a,b,c,d]
the output size of [a,b,c,w.shape[3]]
?
Would it be correct to think that the first dimension does not change?
Upvotes: 1
Views: 1616
Reputation: 12908
This works in your case because of the stride used and the padding applied. The output width and height will not always be the same as the input.
Check out this excellent discussion of the topic. The basic takeaway (taken almost verbatim from that link) is that a convolution layer:
W1 x H1 x D1
K
F
S
P
W2 x H2 x D2
where:
W2 = (W1 - F + 2*P)/S + 1
H2 = (H1 - F + 2*P)/S + 1
D2 = K
And when you are processing batches of data in Tensorflow they typically have shape [batch_size, width, height, depth]
, so the first dimension which is just the number of samples in your batch should not change.
Note that the amount of padding P
in the above is a little tricky with TF. When you give the padding='same'
argument to tf.nn.conv2d
, tensorflow applies zero padding to both sides of the image to make sure that no pixels of the image are ignored by your filter, but it may not add the same amount of padding to both sides (can differ by only one I think). This SO thread has some good discussion on the topic.
In general, with a stride S
of 1 (which your network has), zero padding of P = (F - 1) / 2
will ensure that the output width/height equals the input, i.e. W2 = W1
and H2 = H1
. In your case, F
is 5, so tf.nn.conv2d
must be adding two zeros to each side of the image for a P
of 2, and your output width according to the above equation is W2 = (W1 - 5 + 2*2)/1 + 1 = W1 - 1 + 1 = W1
.
Upvotes: 3