Reputation: 155
I'm trying to understand the transformation performed by tf.layers.conv2d
.
The mnist tutorial code from the TensorFlow website includes the convolution layer:
# Computes 64 features using a 5x5 filter.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size, 14, 14, 32]
# Output Tensor Shape: [batch_size, 14, 14, 64]
conv2 = tf.layers.conv2d(
inputs=pool1,
filters=64,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
However, my expectation is that the 32 input images would be multiplied by the number of filters, as each filter is applied to each image, to give an output tensor of [batch_sz, 14, 14, 2048]
. Clearly this is wrong, but I don't know why. How does the transformation work? The API documentation tells me nothing about how it works. What would be the output if the input tensor was [batch_size, 14, 14, 48]
?
Upvotes: 2
Views: 6995
Reputation: 5722
I think you might have a minor misunderstanding of how filter works here. This introduction and this answer provide some detailed explanation. I found the Convolution Demo animation in the introduction is extremely helpful in showing how it works.
The key point here is how the filter works. Usually, convolutional layer has a set of K
filters (64 in your example). For each filter, the actual shape is kernel_size + depth_of_input
(5x5x32 in your example). That means one filter will look/apply onto 32 channels/images all at once and gives one conclusion/computed_feature. Therefore, the depth/num_of_features of output is equal to your filters
argument rather than input_depth*filters
. Please check this code to get an idea about the real and final kernel for computation.
Therefore, to answer your last question, the output of either [batch_size, 14, 14, 32]
or [batch_size, 14, 14, 48]
will always be [batch_size, 14, 14, 64]
for your setting.
Upvotes: 1
Reputation: 61
The output size depends on the input dimensions, the filter width, padding, and stride. You can evaluate conv2 (and any individual layer, at that) and then print the dimensions of the output to ensure they are what you think. You aren't required to call eval on solely the final layer, because tensorflow is much more flexible than that.
Upvotes: 1