Why would I use a Non Linear activation function in CNN convolutional layer?

Question

I was going through one of the deep learning lectures from MIT on CNN. It said when multiplying weights with pixel values, a non linear activation function like relu can be applied on every pixel. I understand why it should be applied in a simple neural network, since it introduces non linearity in our input data. But why would I want to apply it on a single pixel ? Or am I getting it wrong ?

Aayushi Sanghi · Accepted Answer

You may have got it a little wrong.

When they say "multiplying weights with pixel values" - they refer to the linear operation of multiplying the filter (weights + bias) with the pixels of the image. If you think about it, each filter in a CNN essentially represents a linear equation.

For example - if we're looking at a 4*4 filter, the filter is essentially computing x1 * w1 + x2 * w2 + x3 * w3 + x4 * w4 + b for every 4*4 patch of the image it goes over. (In the above equation, x1,x2,x4,x4 refer to pixels of the image, while w1,w2,w3,w4 refer to the weights present in the CNN filter)

Now, hopefully it's fairly clear that the filter is essentially computing a linear equation. To be able to perform a task like let's say image classification, we require some amount of non-linearity. This is achieved by using, most popularly, the ReLU activation function.

So you aren't applying non linearity to a "pixel" per se, you're still applying it to a linear operation (like in a vanilla neural network) - which consists of pixel values multiplied by the weights present in a filter.

Hope this cleared your doubt, feel free to reach out for more help!

Why would I use a Non Linear activation function in CNN convolutional layer?

Answers (1)

Related Questions