Stephen
Stephen

Reputation: 8800

what is the behavior of SAME padding when stride is greater than 1?

My understanding of SAME padding in Tensorflow is that padding is added such that the output dimensions (for width and height) will be the same as the input dimensions. However, this understanding only really makes sense when stride=1, because if stride is >1 then output dimensions will almost certainly be lower.

So I'm wondering what the algorithm is for calculating padding in this case. Is it simply that padding is added so that the filter is applied to every input value, rather than leaving some off on the right?

Upvotes: 14

Views: 9963

Answers (3)

Jonny_92
Jonny_92

Reputation: 159

I will expand the very nice previous questions. I cannot use LaTex here, otherwise I would insert the equation in a nicer way. The condition of padding "SAME" means that the output o of the layer is equal to ceil(i/s), where i is the input size and s is the stride. I am developping the monodimensional case, because the axes are independent.

The general formula for the output in a convolution is o = floor((i + 2p - k)/s) + 1, where p is padding and k is the kernel size. The only unknow value in this equation is p, because also the output size is given by the 'same' condition.

Given the equation floor(x/y)=n, where x, y real number, and n is an integer, the minimum value of x is the one such that x % y = 0. In such a case, you can eliminate the floor parentheses. Insert x =i + 2p - k, y = s, and n = o -1. The if you solve for p you obtain p = ((o-1)s - i + k)/2. This is the minimum value of padding that ensures the 'same' condition.

The maximum value of x is such that x % y = y-1.By using the fundamental relation of division, you can write x = ny + (y-1). You can substitute the x, y and n quantities according to the previous relations and then solve for p. You obtain p = (k - p + os - 1)/2. This is the maximum value of padding that satisfies the 'same' condition. Any value of padding between these two extremes are ok, but usually the minimum one is used.

Upvotes: 0

Ginés Hidalgo
Ginés Hidalgo

Reputation: 817

Peter's answer is true but might lack a few details. Let me add on top of it.

Autopadding = SAME means that: o = ceil(i/s), where o = output size, i = input size, s = stride.

In addition, the generic output size formula is:

o = floor( (i + p - k) / s)   +   1

Where the new terms are p (pading) and k, i.e., the effective kernel size (including dilation, or just kernel size if dilation is disabled).

If you develop that formula to solve for p, you get:

p_min = (o-1) s - i + k # i.e., when the floor is removed from the previous equation
p_max = o s - i + k - 1 # i.e., when the numerator of the floor % s is s-1

Any padding value p in the range [p_min, p_max] will satisfy the condition o = ceil(i/s), meaning that for a stride s there are s total solution satisfying the formula.

It is the norm to use p_min as padding, so you can ignore all other s-1 solutions.

PS: This would be for 1D, but for nD, simply repeat these formulas independently for each dimension, i.e.,

p_min[dimension_index] = (o[dimension_index]-1)s[dimension_index] - i[dimension_index] + k[dimension_index]

For references, these 2 links are really useful:

Upvotes: 5

PeterZhao
PeterZhao

Reputation: 69

There is a formula for that:

n' = floor((n+2*p-f)/s + 1)

where n' is the output size, n is the input size, p is the padding and f is the filter size, s will be the stride.

If you are using SAME padding with stride > 1, p will be the minimum number to make (n+2*p-f) divisible by s. Note: p could be decimal as it will be averaged over two sides of the image.

Upvotes: 6

Related Questions