Reputation: 6099
I'm trying to understand TensorFlow's convolution, in particular the formula
shape(output) = [batch,
(in_height - filter_height + 1) / strides[1],
(in_width - filter_width + 1) / strides[2],
...]
I would have expected the formula to be
shape(output) = [batch,
(in_height - filter_height) / strides[1] + 1,
(in_width - filter_width) / strides[2] + 1,
...]
instead. Starting from a 32x32 image, and applying a 5x5 filter with strides [1,3,3,1], then in my understanding this should yield a 10x10 output, whose values are the convolutions of the areas
(0:4,0:4) , (0:4,3:7) , (0:4,6:10) , ..., (0:4,27:31),
(3:7,0:4) , (3:7,3:7) , (3:7,6:10) , ..., (3:7,27:31),
...
(27:31,0:4), (27:31,3:7), (27:31,6:10), ..., (27:31,27:31)
so both dimensions should be floor((32-5)/3)+1=10 and not floor((32-5+1)/3)=9. What am I missing here? Have I misunderstood the way convolution is done here and/or what the parameters mean? If so, what parameters should I use in order to obtain the above selection?
Upvotes: 3
Views: 1805
Reputation: 46
According to issue #196, this part of the documentation is apparently wrong; and I think there is still problem in dga's answer.
It should be:
floor((in_height+y_padding-filter_height)/y_stride) + 1,
Upvotes: 3
Reputation: 21917
You're correct - it should be:
ceil(float(in_height - filter_height + 1) / float(strides[1]))
For 32, 5, stride=3, this becomes: ceil(9.33) = 10.
Fixed and will be pushed into github soon. Thanks for catching this! For more info, see the github bug discussion, issue #196
Upvotes: 2