Reputation: 2326
I found some results difficult to understand when trying to debug my neural network. I tried to do some computations offline using scipy
(1.3.0), and I am not having the same results as with keras
(2.3.1) with a tensorflow
(1.14.0) backend. Here is a minimal reproducible example:
from keras.layers import Conv2D, Input
from keras.models import Model
import numpy as np
from scipy.signal import convolve2d
image = np.array([[-1.16551484e-04, -1.88735046e-03, -7.90571701e-03,
-1.52302440e-02, -1.55315138e-02, -8.40757508e-03,
-2.12123734e-03, -1.49851941e-04],
[-1.88735046e-03, -3.05623915e-02, -1.28019482e-01,
-2.46627569e-01, -2.51506150e-01, -1.36146188e-01,
-3.43497843e-02, -2.42659380e-03],
[-7.90571701e-03, -1.28019482e-01, -5.06409585e-01,
-6.69258237e-01, -6.63918257e-01, -5.31925797e-01,
-1.43884048e-01, -1.01644937e-02],
[-1.52302440e-02, -2.46627569e-01, -6.69258296e-01,
2.44587708e+00, 2.72079444e+00, -6.30891442e-01,
-2.77190477e-01, -1.95817426e-02],
[-1.55315138e-02, -2.51506120e-01, -6.63918316e-01,
2.72079420e+00, 3.01719952e+00, -6.19484246e-01,
-2.82673597e-01, -1.99690927e-02],
[-8.40757508e-03, -1.36146188e-01, -5.31925797e-01,
-6.30891442e-01, -6.19484186e-01, -5.57167232e-01,
-1.53017864e-01, -1.08097391e-02],
[-2.12123734e-03, -3.43497805e-02, -1.43884048e-01,
-2.77190447e-01, -2.82673597e-01, -1.53017864e-01,
-3.86065207e-02, -2.72730505e-03],
[-1.49851941e-04, -2.42659380e-03, -1.01644937e-02,
-1.95817426e-02, -1.99690927e-02, -1.08097391e-02,
-2.72730505e-03, -1.92666746e-04]], dtype='float32')
kernel = np.array([[ 0.04277903 , 0.5318366 , 0.025291916],
[ 0.5756132 , -0.493123 , 0.116359994],
[ 0.10616145 , -0.319581 , -0.115053006]], dtype='float32')
print('Mean of original image', np.mean(image))
## Scipy result
res_scipy = convolve2d(image, kernel.T, mode='same')
print('Mean of convolution with scipy', np.mean(res_scipy))
## Keras result
def init(shape, dtype=None):
return kernel[..., None, None]
im = Input((None, None, 1))
im_conv = Conv2D(1, 3, padding='same', use_bias=False, kernel_initializer=init)(im)
model = Model(im, im_conv)
model.compile(loss='mse', optimizer='adam')
res_keras = model.predict_on_batch(image[None, ..., None])
print('Mean of convolution with keras', np.mean(res_keras))
When visualizing the results, I found that they are actually symmetric (point symmetry around the center modulo a little shift).
.
I tried something empirical like transposing the kernel, but it didn't change anything.
EDIT Thanks to @kaya3 comment, I realized that rotating the kernel by 180 degrees did the trick. However, I still don't understand why I need to do this to get the same results.
Upvotes: 2
Views: 1145
Reputation: 51152
I don't know for certain without reading the source code for these two libraries, but there is more than one straightforward way to write a convolution algorithm, and evidently these two libraries implement it in different ways.
One way is to "paint" the kernel onto the output, for each pixel of the image:
from itertools import product
def convolve_paint(img, ker):
img_w, img_h = len(img[0]), len(img)
ker_w, ker_h = len(ker[0]), len(ker)
out_w, out_h = img_w + ker_w - 1, img_h + ker_h - 1
out = [[0]*out_w for i in range(out_h)]
for x,y in product(range(img_w), range(img_h)):
for dx,dy in product(range(ker_w), range(ker_h)):
out[y+dy][x+dx] += img[y][x] * ker[dy][dx]
return out
Another way is to "sum" the contributing amounts at each pixel in the output:
def convolve_sum(img, ker):
img_w, img_h = len(img[0]), len(img)
ker_w, ker_h = len(ker[0]), len(ker)
out_w, out_h = img_w + ker_w - 1, img_h + ker_h - 1
out = [[0]*out_w for i in range(out_h)]
for x,y in product(range(out_w), range(out_h)):
for dx,dy in product(range(ker_w), range(ker_h)):
if 0 <= y-dy < img_h and 0 <= x-dx < img_w:
out[y][x] += img[y-dy][x-dx] * ker[dy][dx]
return out
These two functions produce the same output. However, notice that the second one has y-dy
and x-dx
instead of y+dy
and x+dx
. If the second algorithm is written with +
instead of -
, as might seem natural, then the results will be as if the kernel is rotated by 180 degrees, which is as you've observed.
It's unlikely that either library uses such a simple algorithm to do convolution. For larger images and kernels it's more efficient to use a Fourier transform, applying the convolution theorem. But the difference between the two libraries is likely to be caused by something similar to this.
Upvotes: 1
Reputation: 59731
What is usually called convolution in neural networks (and image processing) is not exactly the mathematical concept of convolution, which is what convolve2d
implements, but the similar one of correlation, which is implemented by correlate2d
:
res_scipy = correlate2d(image, kernel.T, mode='same')
Upvotes: 3