Reputation: 403130
I'd like to discuss a little bit on convolution as applied to CNNs and image filtering... If you have an RGB image (dimensions of say 3xIxI
) and K
filters, each of size 3xFxF
, then you would end up with a Kx(I - F + 1)x(I - F + 1)
output, assuming your stride is 1
and you only consider completely overlapping regions (no padding).
From all the material I've read on convolution, you're basically sliding each filter over the image, and at each stage computing a large number of dot products and then summing them up to get a single value.
For example:
I -> 3x5x5 matrix
F -> 3x2x2 matrix
I * F -> 1x4x4 matrix
(Assume *
is the convolution operation.)
Now, since both your kernel and image have the same number of channels, you are going to end up separating your 3D convolution into a number of parallel 2D convolutions, followed by a matrix summation.
Therefore, the above example should for all intents and purposes (assuming there is no padding and we are only considering completely overlapping regions) be the same as this:
I -> 3x5x5 matrix
F -> 3x2x2 matrix
(I[0] * F[0]) + (I[1] * F[1]) + (I[2] * F[2]) -> 1x4x4 matrix
I am just separating each channel and convolving them independently. Please, look at this carefully and correct me if I'm wrong.
Now, on the assumption that this makes sense, I've carried out the following experiment in python.
import scipy.signal
import numpy as np
import test
x = np.random.randint(0, 10, (3, 5, 5)).astype(np.float32)
w = np.random.randint(0, 10, (3, 2, 2)).astype(np.float32)
r1 = np.sum([scipy.signal.convolve(x[i], w[i], 'valid') for i in range(3)], axis=0).reshape(1, 4, 4)
r2 = scipy.signal.convolve(x, w, 'valid')
print r1.shape
print r1
print r2.shape
print r2
This gives me the following result:
(1, 4, 4)
[[[ 268. 229. 297. 305.]
[ 256. 292. 322. 190.]
[ 173. 240. 283. 243.]
[ 291. 271. 302. 346.]]]
(1, 4, 4)
[[[ 247. 229. 291. 263.]
[ 198. 297. 342. 233.]
[ 208. 268. 268. 185.]
[ 276. 272. 280. 372.]]]
I'd just like to know whether this is due to:
Or any combination of the above. Thanks for reading!
Upvotes: 1
Views: 930
Reputation: 114956
You wrote:
... the same as this:
I -> 3x5x5 matrix
F -> 3x2x2 matrix
(I[0] * F[0]) + (I[1] * F[1]) + (I[2] * F[2]) -> 1x4x4 matrix
You have forgotten that convolution reverses one of the arguments. So the above is not true. Instead, the last line should be:
(I[0] * F[2]) + (I[1] * F[1]) + (I[2] * F[0]) -> 1x4x4 matrix
For example,
In [28]: r1 = np.sum([scipy.signal.convolve(x[i], w[2-i], 'valid') for i in range(3)], axis=0).reshape(1, 4, 4)
In [29]: r2 = scipy.signal.convolve(x, w, 'valid')
In [30]: r1
Out[30]:
array([[[ 169., 223., 277., 199.],
[ 226., 213., 206., 247.],
[ 192., 252., 332., 369.],
[ 167., 266., 321., 323.]]], dtype=float32)
In [31]: r2
Out[31]:
array([[[ 169., 223., 277., 199.],
[ 226., 213., 206., 247.],
[ 192., 252., 332., 369.],
[ 167., 266., 321., 323.]]], dtype=float32)
Upvotes: 4