kunal18
kunal18

Reputation: 1926

necessity of transposed convolution when feature maps are not downsampled

I was reading a paper here. The authors in the paper have proposed a symmetric generator network which contains a stack of convolution layers followed by a stack of de-convolution (transposed convolution) layers. It is also mentioned that a stride of 1 with appropriate padding is used to ensure that feature map size is same as input image size.

My question is if there is no downsampling, then why transposed conolution layers are used? Can't the generator be constructed only with convolution layers? Am I missing something about transposed convolution layers here (is it being used for some other purpose)? Please help.

Update: I am re-opening this question, as I came across this paper which states in section 2.1.1 that "deconvolution is used to compensate the details". However, I am not able to appreciate this because there is no downsampling of feature maps in the proposed model. Can somebody explain why deconvolution is preferred over convolution here? What makes deconvolution layer perform better than convolution in this case?

Upvotes: 0

Views: 310

Answers (1)

CanyonCat
CanyonCat

Reputation: 339

In theory spatial convolution can be used as a replacement for fractionally-strided convolution. Typically this is avoided because, even without some type of pooling, convolutional layers can produce outputs that are smaller than their corresponding inputs (see the formulae for owidth and oheight in the docs here). Using nn.SpatialConvolution to produce outputs that are larger than inputs would require a great deal of inefficient zero-padding to reach the original input size. To make the reverse process easier, torch functionality was added for fractionally-strided convolution.

That being said, this case is a bit different since the size at each layer remains constant. So it is quite possible that using nn.SpatialConvolution for the entire generator will work. You'll still want to mirror the encoder's nInputPlane and nOutputPlane pattern to successfully move from feature space back to input space.

Likely the authors referred to the decoder process as using transpose convolution just for clarity and generality.

This article discusses convolution and fractionally-strided convolution, and provides nice graphics that I do not wish to copy here.

Upvotes: 1

Related Questions