Nikhil
Nikhil

Reputation: 143

Depthwise separable Convolution

I am new to Deep Learning and I recently came across Depth Wise Separable Convolutions. They significantly reduce the computation required to process the data and need only like 10% of standard convolution step computation.

I am curious as to what is the intuition behind doing this? We do achieve greater speed by reducing number of parameters and doing less computation, but is there a trade-off in performance?

Also, is it only used for some specific use cases like images, etc or can it be applied to all forms of data?

Upvotes: 2

Views: 2716

Answers (2)

Learninger
Learninger

Reputation: 61

Intuitively, depthwise separable conovolutions (DSCs) model the spatial correlation and cross-channel correlation separately while regular convolutions model them simultaneously. In our recent paper published on BMVC 2018, we give a mathematical proof that DSC is nothing but the principal component of regular convolution. This means it can capture the most effective part of regular convolution and discard other redundant parts, making it super efficient. For the tradeoff, in our paper, we gave some empirical results on VGG16 data-free regular convolution decomposition. However, with enough fine-tuning, we could almost reduce the accuracy degradation. Hope our paper helps you understand DSC further.

Upvotes: 6

pietz
pietz

Reputation: 2533

Intuition

The intuition behind doing this is to decouple the spatial information (width and height) and the depthwise information (channels). While regular convolutional layers will merge feature maps over the number of input channels, depthwise separable convolutions will perform another 1x1 convolution before adding them up.

Performance

Using a depthwise separable convolutional layer as a drop-in replacement for a regular one will greatly reduce the number of weights in the model. It will also very likely hurt the accuracy due to the much smaller number of weights. However, if you change the width and depth of your architecture to increase the weights again, you may reach the same accuracy of the original model with less parameters. At the same time a depthwise separable model with the same number of weights might achieve a higher accuracy compared the original model.

Application

You can use them wherever you can apply a CNN. I'm sure you will find use cases of depthwise separable models outside image related tasks. It's just that CNNs have been most popular around images.

Further Reading

Let me shamelessly point you to an article of mine that discusses different types of convolutions in Deep Learning with some information how they work. Maybe that helps as well.

Upvotes: 4

Related Questions