user1111929
user1111929

Reputation: 6099

How to interpret TensorFlow's convolution filter and striding parameters?

I'm trying to understand TensorFlow's convolution, in particular the formula

shape(output) = [batch,
             (in_height - filter_height + 1) / strides[1],
             (in_width - filter_width + 1) / strides[2],
             ...]

I would have expected the formula to be

shape(output) = [batch,
             (in_height - filter_height) / strides[1] + 1,
             (in_width - filter_width) / strides[2] + 1,
             ...]

instead. Starting from a 32x32 image, and applying a 5x5 filter with strides [1,3,3,1], then in my understanding this should yield a 10x10 output, whose values are the convolutions of the areas

 (0:4,0:4) ,  (0:4,3:7) ,  (0:4,6:10) , ...,  (0:4,27:31), 
 (3:7,0:4) ,  (3:7,3:7) ,  (3:7,6:10) , ...,  (3:7,27:31),
...
(27:31,0:4), (27:31,3:7), (27:31,6:10), ..., (27:31,27:31)

so both dimensions should be floor((32-5)/3)+1=10 and not floor((32-5+1)/3)=9. What am I missing here? Have I misunderstood the way convolution is done here and/or what the parameters mean? If so, what parameters should I use in order to obtain the above selection?

Upvotes: 3

Views: 1805

Answers (2)

HSU
HSU

Reputation: 46

According to issue #196, this part of the documentation is apparently wrong; and I think there is still problem in dga's answer.

It should be:

floor((in_height+y_padding-filter_height)/y_stride) + 1,

  • When padding=VALID, y_padding=0.
  • When padding=SAME, in general y_padding should be adjusted to make (in_height+y_padding-filter_height)/y_stride an integer so that 'floor' becomes unnecessary.

Upvotes: 3

dga
dga

Reputation: 21917

You're correct - it should be:

ceil(float(in_height - filter_height + 1) / float(strides[1]))

For 32, 5, stride=3, this becomes: ceil(9.33) = 10.

Fixed and will be pushed into github soon. Thanks for catching this! For more info, see the github bug discussion, issue #196

Upvotes: 2

Related Questions