Reputation: 1926
I was reading a paper here. The authors in the paper have proposed a symmetric generator network which contains a stack of convolution layers followed by a stack of de-convolution (transposed convolution) layers. It is also mentioned that a stride of 1 with appropriate padding is used to ensure that feature map size is same as input image size.
My question is if there is no downsampling, then why transposed conolution layers are used? Can't the generator be constructed only with convolution layers? Am I missing something about transposed convolution layers here (is it being used for some other purpose)? Please help.
Update: I am re-opening this question, as I came across this paper which states in section 2.1.1 that "deconvolution is used to compensate the details". However, I am not able to appreciate this because there is no downsampling of feature maps in the proposed model. Can somebody explain why deconvolution is preferred over convolution here? What makes deconvolution layer perform better than convolution in this case?
Upvotes: 0
Views: 310
Reputation: 339
In theory spatial convolution can be used as a replacement for fractionally-strided convolution. Typically this is avoided because, even without some type of pooling, convolutional layers can produce outputs that are smaller than their corresponding inputs (see the formulae for owidth
and oheight
in the docs here). Using nn.SpatialConvolution
to produce outputs that are larger than inputs would require a great deal of inefficient zero-padding to reach the original input size. To make the reverse process easier, torch functionality was added for fractionally-strided convolution.
That being said, this case is a bit different since the size at each layer remains constant. So it is quite possible that using nn.SpatialConvolution
for the entire generator will work. You'll still want to mirror the encoder's nInputPlane
and nOutputPlane
pattern to successfully move from feature space back to input space.
Likely the authors referred to the decoder process as using transpose convolution just for clarity and generality.
This article discusses convolution and fractionally-strided convolution, and provides nice graphics that I do not wish to copy here.
Upvotes: 1