Reputation: 1245
Convolution for a grayscale image is straightforward. You have a filter of shape nxnx1
and convolve the input image to extract whatever features you desire.
I also understand how convolution would work for a RGB image. The filter would have a shape of nxnx3
. However, would all 3 'layers' in the filter hold the same kernel? For example, if the 0th layer a map as shown below, would layer 1 and 2 also hold the exact values? I am asking in regards to Convolutional Neural Networks and not conventional image processing. I understand the weights of each filter are learned and are randomized initially, am I correct in thinking that each layer would have different randomized values?
Upvotes: 1
Views: 1163
Reputation: 3773
The short answer is no. The longer answer is, there isn't a kernel per layer, but instead just one kernel which handles all input and output layer at once.
The code below shows step by step how one would calculate each convolution manually, and from this we can see that at a high level the calculation goes like this:
All the colors are processed at once using matrix multiplication with the kernel matrix. If we think about the kernel matrix, we can see that the values in the kernel matrix that are used to generate the first filter are in the first column, and the values to generate the second filter are in the second column. So, indeed, the values are different and not reused, but they are not stored or applied separately.
import tensorflow as tf
import numpy as np
# Define a 3x3 kernel that after convolution will create an image with 2 filters (channels)
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=3)
# Lets create a random input image
starting_image = np.array( np.random.rand(1,4,4,3), dtype=np.float32)
# and process it
result = conv_layer(starting_image)
weight, bias = conv_layer.get_weights()
print('size of weight', weight.shape)
print('size of bias', bias.shape)
size of weight (3, 3, 3, 2)
size of bias (2,)
# The output of the convolution of the 4x4x3 image input
# is a 2x2x2 output (because we don't have padding)
result.numpy()
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
# Now let's see how we can recreate this using the weights
# The way convolution is done is to extract a patch
# the size of the kernel (3x3 in this case)
# We will use the first patch, the first three rows and columns and all the colors
patch = starting_image[0,:3,:3,:]
print('patch.shape' , patch.shape)
# Then we flatten the patch
flat_patch = np.reshape( patch, [1,-1] )
print('New shape is', flat_patch.shape)
patch.shape (3, 3, 3)
New shape is (1, 27)
# next we take the weight and reshape it to be [-1,filters]
flat_weight = np.reshape( weight, [-1,2] )
print('flat_weight shape is ',flat_weight.shape)
flat_weight shape is (27, 2)
# we have the patch of shape [1,27] and the weight of [27,2]
# doing a matric multiplication of the two shapes [1,27]*[27,2] = a shape of [1,2]
# which is the output we want, 2 filter outputs for this patch
output_for_patch = np.matmul(flat_patch,flat_weight)
# but we haven't added the bias yet, so lets do that
output_for_patch = output_for_patch + bias
# Finally, we can see that our manual calculation matches
# what Conv2D does exactly for the first patch
output_for_patch
array([[-0.34940773, -0.64269245]], dtype=float32)
If we compare this to the full convolution above, we can see that this is exactly the first patch
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
We would repeat this process for each patch. If we want to optimize this code some more, instead of passing only one image patch at a time [1,27] we can pass [batch_number,27] patches at a time and the kernel will process them all at once returning [batch_number,filter_size].
Upvotes: 3