Dilated Convolution, atrous, receptive fields

Question

I do not understand the value of the receptive field for the dilated convolution. In fact, for a regular convolution 3x3 for example, the receptive field is 3x3. Two 3x3 the receptive field is 5x5. But for a dilated convolution how does it work?

In fact, a 4 dilation means 1 convolution 3x3, then we apply another convolution 5x5 (with only 9 value different to 0) , then a convolution 7x7 (with only 9 value different to 0) and finally a convolution 9x9? If it is the case I do not understand the value of 15x15 for the 4 dilation. With the 3 dilatation I obtained 13x13 if my computing is right. Where I am wrong?

What's the use of dilated convolutions?

Cris Luengo · Accepted Answer

As you can see in the answer to the question you linked, the stride (distance between input values for the convolution) grows exponentially, not linearly. The 4-dilated convolution as shown there uses 3 layers (I link the image here for reference):

Each time you apply a 3x3 convolution, but the distance between the input samples grows: 1 in the first layer, 2 in the second layer, and 4 in the third layer. The first two layers together yield a receptive field that is a 3x3 block around each red dot, leading to a 7x7 receptive field. In the 3rd layer, these 7x7 blocks are again repeated for each of the 9 red dots, leading to a 15x15 receptive field.

Note that it should be possible to increase these distances further, since there is quite a bit of overlap (darker shaded areas in the figure): 2nd layer a step size of 3, leading to 9x9, 3rd layer a step size of 9, leading to 27x27. I don't know if this would lead to worse performance, I have never applied this technique.

Dilated Convolution, atrous, receptive fields

Answers (1)

Related Questions