Reputation: 368
I trained the ResNet50V2 model and I was wondering how the tensors transform from 3
channels to n
channels.
I have the model as:
model.summary()
Model: "model_9"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_9 (InputLayer) (None, 164, 164, 3) 0
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 170, 170, 3) 0 input_9[0][0]
__________________________________________________________________________________________________
conv1_conv (Conv2D) (None, 82, 82, 64) 9472 conv1_pad[0][0]
__________________________________________________________________________________________________
pool1_pad (ZeroPadding2D) (None, 84, 84, 64) 0 conv1_conv[0][0]
__________________________________________________________________________________________________
...
...
...
...
...
...
post_relu (Activation) (None, 6, 6, 2048) 0 post_bn[0][0]
__________________________________________________________________________________________________
flatten_9 (Flatten) (None, 73728) 0 post_relu[0][0]
__________________________________________________________________________________________________
dense_9 (Dense) (None, 37) 2727973 flatten_9[0][0]
==================================================================================================
Total params: 26,292,773
Trainable params: 26,247,333
Non-trainable params: 45,440
The first convolution layer "conv1_conv" has a filter:
filters= layer.get_weights()[2] #conv1_conv layer
print(layer.name, filters.shape)
Output:
conv1_conv (7, 7, 3, 64)
What I don't understand is the convolution operation that makes the (170,170,3)
tensor convert to (82,82,64)
tensor.
What does the 64
in the conv1_conv
indicate?
Upvotes: 1
Views: 278
Reputation: 11213
You can imagine the convolution as a sliding window of size 7 × 7 sliding over the image. Each filter takes a window of the image, here 7 × 7 × 3 numbers a makes a linear projection into a single number. You need 7*7*3 parameters for linear projection for each filter and you have 64 of them, therefore the shape of the convolution 7 × 7 × 3 × 64.
The other important property of the convolution is stride: this is a step by which the window moves. You have window size 7 and the image has width and height 170, i.e., the sliding window needs to pass 170-7=163 pixels. If you do it with stride 2, it means 163/2=81.5 windows, rounded to 82. Each of the windows gets projected with 64 filters, therefore the shape 82 × 82 × 64.
Upvotes: 1