Reputation: 824
I was reading the following statement about how convolution is equivariant with respect to translation from the Deep Learning Book.
Let g be a function mapping one image function to another image function, such that I'=g(I) is the image function with I'(x, y) =I(x−1, y). This shifts every pixel ofIone unit to the right. If we apply this transformation to I, then apply convolution, the result will be the same as if we applied convolution to I', then applied the transformation g to the output.
For the last line I bolded, they are applying convolution to I', but shouldn't this be I? I' is the translated image. Otherwise it would effectively be saying:
f(g(I)) = g( f(g(I)) )
where f is convolution & g is translation. I am trying to execute the same myself in python using 3D kernel equal to the depth of the image as would be the case in the convolution layer for a colored image, a house.
Here is my code for applying a translation & then convolution to an image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Translation
translated = 100
new_I = np.zeros( (I.shape[0]-translated, I.shape[1], I.shape[2]) )
for i in range(translated, I.shape[0]):
for j in range(I.shape[1]):
for l in range(I.shape[2]):
new_I[i-translated,j,l] = I[i,j,l]
## Convolution
conv = np.zeros( (int((new_I.shape[0]-3)/2), int((new_I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(new_I[2*i:2*i+3, 2*j:2*j+3, :], k)
scipy.misc.imsave('pics/convoled_image_2nd.png', conv)
I get the following output:
Now, I switch the convolution and Translation steps:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Convolution
conv = np.zeros( (int((I.shape[0]-3)/2), int((I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(I[2*i:2*i+3, 2*j:2*j+3, :], k)
## Translation
translated = 100
new_I = np.zeros( (conv.shape[0]-translated, conv.shape[1]) )
for i in range(translated, conv.shape[0]):
for j in range(conv.shape[1]):
new_I[i-translated,j] = conv[i,j]
scipy.misc.imsave('pics/conv_trans_image.png', new_I)
And now I get the following output:
Shouldn't they be the same according the book? What am I doing wrong?
Upvotes: 3
Views: 927
Reputation: 27251
Just as the book says, the linearity properties of convolution and translation guarantee that their order is interchangable, excepting boundary effects.
For instance:
import numpy as np
from scipy import misc, ndimage, signal
def translate(img, dx):
img_t = np.zeros_like(img)
if dx == 0: img_t[:, :] = img[:, :]
elif dx > 0: img_t[:, dx:] = img[:, :-dx]
else: img_t[:, :dx] = img[:, -dx:]
return img_t
def convolution(img, k):
return np.sum([signal.convolve2d(img[:, :, c], k[:, :, c])
for c in range(img.shape[2])], axis=0)
img = ndimage.imread('house.jpg')
k = np.array([
[[ 0, 1, -1], [1, -1, 0], [ 0, 0, 0]],
[[-1, 0, -1], [1, -1, 0], [ 1, 0, 0]],
[[ 1, -1, 0], [1, 0, 1], [-1, 0, 1]]])
ct = translate(convolution(img, k), 100)
tc = convolution(translate(img, 100), k)
misc.imsave('conv_then_trans.png', ct)
misc.imsave('trans_then_conv.png', tc)
if np.all(ct[2:-2, 2:-2] == tc[2:-2, 2:-2]):
print('Equal!')
Prints:
Equal!
The problem is that you're overtranslating in the second example. After you shrink the image 2x, try translating by 50
instead.
Upvotes: 1