Qubix
Qubix

Reputation: 4353

Tensorflow: What exactly does depthwise convolution do?

I want to use depthwise_conv2d from Tensorflow. As far as I understand it now, it performs regular 2D convolutions for every single channel, each with a depth_multiplier number of features.

Then I should expect, if depth_multiplier = 1, to have the number of input channels the same as the number of output channels. But why can I have 256 input channels and 512 output channels? Where do the extra channels come from?

Upvotes: 4

Views: 4897

Answers (2)

Robert Lugg
Robert Lugg

Reputation: 1192

I've modified @vijay m 's code to spell things out further. His answer is absolutely correct. However, I still didn't get it.

The quick answer is that "channel multiplier" is a confusing name for that argument. It could be called "The number of filters you wish to apply per channel". So, notice the size of this code snippet:

filters = tf.Variable(tf.random_normal((5,5,100,10)))

The size of that allows you to apply 10 different filters to each channel of input. I've created a version of the previous answer's code that may be instructive:

# batch of 2 inputs of 13x13 pixels with 3 channels each.
# Four 5x5 filters applied to each channel, so 12 total channels output
inputs_np = np.ones((2, 13, 13, 3))
inputs = tf.constant(inputs_np)
# Build the filters so that their behavior is easier to understand.  For these filters
# which are 5x5, I set the middle pixel (location 2,2) to some value and leave
# the rest of the pixels at zero
filters_np = np.zeros((5,5,3,4)) # 5x5 filters for 3 inputs and applying 4 such filters to each one.
filters_np[2, 2, 0, 0] = 2.0
filters_np[2, 2, 0, 1] = 2.1
filters_np[2, 2, 0, 2] = 2.2
filters_np[2, 2, 0, 3] = 2.3
filters_np[2, 2, 1, 0] = 3.0
filters_np[2, 2, 1, 1] = 3.1
filters_np[2, 2, 1, 2] = 3.2
filters_np[2, 2, 1, 3] = 3.3
filters_np[2, 2, 2, 0] = 4.0
filters_np[2, 2, 2, 1] = 4.1
filters_np[2, 2, 2, 2] = 4.2
filters_np[2, 2, 2, 3] = 4.3
filters = tf.constant(filters_np)
out = tf.nn.depthwise_conv2d(
      inputs,
      filters,
      strides=[1,1,1,1],
      padding='SAME')
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    out_val = out.eval()

print("output cases 0 and 1 identical? {}".format(np.all(out_val[0]==out_val[1])))
print("One of the pixels for each of the 12 output {} ".format(out_val[0, 6, 6]))
# Output:
# output cases 0 and 1 identical? True
# One of the pixels for each of the 12 output [ 2.   2.1  2.2  2.3  3.   3.1  3.2  3.3  4.   4.1  4.2  4.3]

Upvotes: 8

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

The filters is of size[filter_height, filter_width, in_channels, channel_multiplier]. If the channel_multiplier = 1, then you get the same number of input channels as output. If its N, then you get N*input_channels as output channels, with each input channel convolved with N filters.

For example,

inputs = tf.Variable(tf.random_normal((20, 64,64,100)))
filters = tf.Variable(tf.random_normal((5,5,100,10)))
out = tf.nn.depthwise_conv2d(
      inputs,
      filters,
      strides=[1,1,1,1],
      padding='SAME')

you get out of shape: shape=(20, 64, 64, 1000)

Upvotes: 9

Related Questions