Translational equivariance and its relationship with convolutonal layer and spatial pooling layer

Question

In the context of convolutional neural network model, I once heard a statement that:

One desirable property of convolutions is that they are translationally equivariant; and the introduction of spatial pooling can corrupt the property of translationally equivalent.

What does this statement mean, and why?

Salvador Dali · Accepted Answer

Most probably you heard it from Bengio's book. I will try to give you my explanation.

In a rough sense, two transformations are equivariant if f(g(x)) = g(f(x)). In your case of convolutions and translations means that if you convolve(translate(x)) it would be the same as if you translate(convolve(x)). This is desired because if your convolution will find an eye of a cat in an image, it will find that eye if you will shift the image.

You can see this by yourself (I use 1d conv only because it is easy to calculate stuff). Lets convolve v = [4, 1, 3, 2, 3, 2, 9, 1] with k = [5, 1, 2]. The result will be [27, 12, 23, 17, 35, 21]

Now let's shift our v by appending it with something v' = [8] + v. Convolving with k you will get [46, 27, 12, 23, 17, 35, 21]. As you the result is just a previous result prepended with some new stuff.

Now the part about spatial pooling. Let's do a max-pooling of size 3 on the first result and on the second one. In the first case you will get [27, 35], in the second [46, 35, 21]. As you see 27 somehow disappeared (result was corrupted). It will be more corrupted if you will take an average pooling.

P.S. max/min pooling is the most translationally invariant of all poolings (if you can say so, if you compare the number of non-corrupt elements).

Translational equivariance and its relationship with convolutonal layer and spatial pooling layer

Answers (2)

Related Questions