dtr43
dtr43

Reputation: 155

Problem with solving exquations of TransposedConv2D for finding its parameters

Regarding the answer posted here, when I want to use the equations for obtaining the values of the parameters of the transposed convolution, I face some problems. For example, I have a tensor with the size of [16, 256, 16, 160, 160] and I want to upsample that to the size of [16, 256, 16, 224, 224]. Based on the equation of the transposed convolution, when, for solving the equations for the height, I select stride+2 and I want to find the k (kernel size), I have the following equation that the kernel size will have a large and also negative value.

224 = (160 - 2)x (2) + 1x(k - 1) + 1

What is wrong with my calculations and how I can find the parameters.

Upvotes: 4

Views: 134

Answers (2)

Alexey Birukov
Alexey Birukov

Reputation: 1680

There is no good constructive answer to this question.

Being in some sense inverse to conv2d, which downsample image stride times, transposed_conv2d upsample stride times. One can not use it for arbitrary resize and get evenly good result, there's torchvision.transforms.Resize or adaptive pooling for this.

torchvision.transforms.Resize is the default choice, it is simple and flexible, one can feed PIL image or torch.Tensor to it, - use former, if input sizes vary dynamically, use latter, if not.

Adaptive pooling, usually it is AdaptiveAvgPool2d, is more sofisticated, it supposed to be a part of architecture. Being inserted at the begining of network, it works as (batched) ImageResize; no magic - it is CPU implemented usualy, one will have a hard time implementing it on tensor hardware. In embedded solutions it is typical to have special image processor for such work.

Well, you still could formaly solved the task with transposed_conv2d, by playing with padding, but it would be just cutting off part of the image, probably loosing information, or inserting a lot of useless spacing.

Upvotes: 1

Devendra Vyas
Devendra Vyas

Reputation: 95

I don't think you applied the formula incorrectly, I think it's primarily the issue with the input and output dimensions you desire that are not possible with a stride=2

Transposed or Dialated convolutions scale the output really quickly. Let's say for example, you were just taking these params for your Transposed Convolution(I'm simplifying the values here to 1D just to make the calculations clear):

Input Size = 160
Stride = 2
Kernel = 1
Padding = 0
Output Padding = 0

Now we apply the formula from the official docs for calculating output shape:

H_out =(H_in − 1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1

OR we can simplify the formula a bit:

Output Size = ((Input Size - 1) * Strides) - (2 * Padding) + Filter_Size + Ouput Padding

Here, Filter_Size = dilation_factor* (kernel_size-1) to make the formula seem less scary.

Now let's take our example and put the values in to see what Transposed OUtput size we can get with the stride=2 and smallest kernel size possible, that is, kernel=1

Ouput_Size = ((160-1)*2) - (2*0) + 1*(1-1) + 0
Output_Size = 318 - 0 + 0 + 0
Output_Size = 318

So, with the stride you want, you will atleast have an output_size >= 318 and you want 224 hence the negative kernel_size.

I hope that answers your question.

Ref Links to understand Transposed Convolution calculations better with an example:

Paperspace: Transpose Convolution Explained for Up-Sampling Images

Calculating the Output Size of Convolutions and Transpose Convolutions

Upvotes: 1

Related Questions