Receptive Field Arithmetic on the TF CNN example

Question

Below is the code tensorflow provides. I will describe my current understanding of the receptive field size changes and would greatly appreciate if someone could let me know where my misunderstanding is.

Overview: [28,28] -> 32 [24,24] -> 32 [12,12] -> 2048 [8,8]

Long version:

Start off with the [28,28] array
The first convolution layer has 32 filters with a kernel size of [5,5] so the output is 32 [24,24]s.
Pooling layer with a kernel of [2,2] and stride of 2 maintains the number of arrays but reduces the size so the output is 32 [12,12]s.
The next convolution layer has 64 filters of size [5,5] so we end up with 2048 [8,8]s

2048 [8,8]s is not what is represented in the subsequent code. What is my error here? All guidance is appreciated.

  # Input Layer
  input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)

  # Pooling Layer #1
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

  # Convolutional Layer #2 and Pooling Layer #2
  conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=64,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)             
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

  # Dense Layer
  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
  dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

Stephen · Accepted Answer

The conv2d layers are using padding="same", which means the input is padded with zeros so that the output is the same size. To get the result you expect we would use padding="valid", which means no padding.

Receptive Field Arithmetic on the TF CNN example

Answers (1)

Related Questions