Reputation: 155
Regarding the answer posted here, when I want to use the equations for obtaining the values of the parameters of the transposed convolution, I face some problems. For example, I have a tensor with the size of [16, 256, 16, 160, 160] and I want to upsample that to the size of [16, 256, 16, 224, 224]. Based on the equation of the transposed convolution, when, for solving the equations for the height, I select stride+2 and I want to find the k (kernel size), I have the following equation that the kernel size will have a large and also negative value.
224 = (160 - 2)x (2) + 1x(k - 1) + 1
What is wrong with my calculations and how I can find the parameters.
Upvotes: 4
Views: 134
Reputation: 1680
There is no good constructive answer to this question.
Being in some sense inverse to conv2d
, which downsample image stride
times, transposed_conv2d
upsample stride
times. One can not use it for arbitrary resize
and get evenly good result, there's torchvision.transforms.Resize
or adaptive pooling for this.
torchvision.transforms.Resize
is the default choice, it is simple and flexible, one can feed PIL image
or torch.Tensor
to it, - use former, if input sizes vary dynamically, use latter, if not.
Adaptive pooling, usually it is AdaptiveAvgPool2d
, is more sofisticated, it supposed to be a part of architecture. Being inserted at the begining of network, it works as (batched) ImageResize; no magic - it is CPU implemented usualy, one will have a hard time implementing it on tensor hardware. In embedded solutions it is typical to have special image processor for such work.
Well, you still could formaly solved the task with transposed_conv2d
, by playing with padding
, but it would be just cutting off part of the image, probably loosing information, or inserting a lot of useless spacing.
Upvotes: 1
Reputation: 95
I don't think you applied the formula incorrectly, I think it's primarily the issue with the input and output dimensions you desire that are not possible with a stride=2
Transposed or Dialated convolutions scale the output really quickly. Let's say for example, you were just taking these params for your Transposed Convolution(I'm simplifying the values here to 1D just to make the calculations clear):
Input Size = 160
Stride = 2
Kernel = 1
Padding = 0
Output Padding = 0
Now we apply the formula from the official docs for calculating output shape:
H_out =(H_in − 1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
OR we can simplify the formula a bit:
Output Size = ((Input Size - 1) * Strides) - (2 * Padding) + Filter_Size + Ouput Padding
Here, Filter_Size = dilation_factor* (kernel_size-1)
to make the formula seem less scary.
Now let's take our example and put the values in to see what Transposed OUtput size we can get with the stride=2
and smallest kernel size possible, that is, kernel=1
Ouput_Size = ((160-1)*2) - (2*0) + 1*(1-1) + 0
Output_Size = 318 - 0 + 0 + 0
Output_Size = 318
So, with the stride you want, you will atleast have an output_size >= 318
and you want 224
hence the negative kernel_size
.
I hope that answers your question.
Ref Links to understand Transposed Convolution calculations better with an example:
Paperspace: Transpose Convolution Explained for Up-Sampling Images
Calculating the Output Size of Convolutions and Transpose Convolutions
Upvotes: 1