Why would `DepthwiseConv2D` be slower than `Conv2D`

Question

I'm reimplementing MobileNet, but I find the depthwise convolution is no faster than conv2d(I haven't included the 1 by 1 pointwise convolution yet). Here's the test code run on colab: https://colab.research.google.com/drive/1nBuYrmmH5kM0jbtIZdsuiG6uJbU6mpA7?usp=sharing

import tensorflow as tf
import time
x = tf.random.normal((2, 64, 64, 3))
conv = tf.keras.layers.Conv2D(16, 3, strides=1, padding='same')
dw = tf.keras.layers.DepthwiseConv2D(3, padding='same')
start = time.time()
conv(x)
print('conv2d:', time.time() - start)    # approximate 0.0036s
start = time.time()
dw(x)
print('dw:', time.time() - start)    # approximate 0.0034s
%timeit conv(x)    # 1000 loops, best of 3: 225 µs per loop
%timeit dw(x)    # 1000 loops, best of 3: 352 µs per loop

I also try it on my laptop using CPUs only, similar results are spotted. Why would DepthwiseConv2D be slower than Conv2D? Did I make any mistakes?

F&#225;bio Perez · Accepted Answer

Although more memory efficient, depthwise 2D convolutions can indeed be slower than regular 2D convolutions.

Gholami et al. (SqueezeNext: Hardware-Aware Neural Network Design) states that:

The reason for this is the inefficiency of depthwise-separable convolution in terms of hardware performance, which is due to its poor arithmetic intensity (ratio of compute to memory operations).

Why would `DepthwiseConv2D` be slower than `Conv2D`

Answers (1)

Related Questions