How to understand the conv2d_transpose in tensorflow

Question

The following is a test for conv2d_transpose.

import tensorflow as tf
import numpy as np
x = tf.constant(np.array([[
    [[-67], [-77]], 
    [[-117], [-127]]
]]), tf.float32)

# shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
f = tf.constant(np.array([
    [[[-1]], [[2]], [[-3]]], 
    [[[4]], [[-5]], [[6]]], 
    [[[-7]], [[8]], [[-9]]]
]), tf.float32)

conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 5, 5, 1), strides=[1, 2, 2, 1], padding='VALID')

The result:

tf.Tensor(
[[[[   67.]
   [ -134.]
   [  278.]
   [ -154.]
   [  231.]]

  [[ -268.]
   [  335.]
   [ -710.]
   [  385.]
   [ -462.]]

  [[  586.]
   [ -770.]
   [ 1620.]
   [ -870.]
   [ 1074.]]

  [[ -468.]
   [  585.]
   [-1210.]
   [  635.]
   [ -762.]]

  [[  819.]
   [ -936.]
   [ 1942.]
   [-1016.]
   [ 1143.]]]], shape=(1, 5, 5, 1), dtype=float32)

To my understanding, it should work as described in Figure 4.5 in the doc

Therefore, the first element (conv[0,0,0,0]) should be -67*-9=603. Why it turns out to be 67?

The result may be expained by the following image:. But why the convolution kernel is inversed?

Balraj Ashwath · Accepted Answer

To explain best, I have made a draw.io figure to explain the results that you obtained.

I guess above illustration might help explain the reason why the first element of transpose conv. feature map is 67.

A key thing to note:

Unlike traditional convolution, in transpose convolution each element of the filter is multiplied by an element of the input feature map and the results of those individual multiplications & intermediate feature maps are overlaid on one another to create the final feature map. The stride determines how far apart the overlays are. In our case, stride = 2, hence the filter moves by 2 in both x & y dimension after each convolution with the original downsampled feature map.

How to understand the conv2d_transpose in tensorflow

Answers (1)

Related Questions