Depthwise separable Convolution

Question

I am new to Deep Learning and I recently came across Depth Wise Separable Convolutions. They significantly reduce the computation required to process the data and need only like 10% of standard convolution step computation.

I am curious as to what is the intuition behind doing this? We do achieve greater speed by reducing number of parameters and doing less computation, but is there a trade-off in performance?

Also, is it only used for some specific use cases like images, etc or can it be applied to all forms of data?

pietz · Accepted Answer

Intuition

The intuition behind doing this is to decouple the spatial information (width and height) and the depthwise information (channels). While regular convolutional layers will merge feature maps over the number of input channels, depthwise separable convolutions will perform another 1x1 convolution before adding them up.

Performance

Using a depthwise separable convolutional layer as a drop-in replacement for a regular one will greatly reduce the number of weights in the model. It will also very likely hurt the accuracy due to the much smaller number of weights. However, if you change the width and depth of your architecture to increase the weights again, you may reach the same accuracy of the original model with less parameters. At the same time a depthwise separable model with the same number of weights might achieve a higher accuracy compared the original model.

Application

You can use them wherever you can apply a CNN. I'm sure you will find use cases of depthwise separable models outside image related tasks. It's just that CNNs have been most popular around images.

Further Reading

Let me shamelessly point you to an article of mine that discusses different types of convolutions in Deep Learning with some information how they work. Maybe that helps as well.

Depthwise separable Convolution

Answers (2)

Related Questions