Tensorflow CNN MNIST example, weight dimensions

Question

I just started programming in Tensorflow, although I'm already very comfortable with the concept of neural networks in general (it's weird, I know, blame my university). I've been trying to alter the implementation of this CNN example to get my own design to work. My question is about the weight initialization:

weights = {
    # 5x5 conv, 1 input, 32 outputs (i.e. 32 filters)
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

If the second layer has 32 inputs and 64 outputs, does that mean that it applies only 2 filters? (seems so little?) And does that mean that, to implement 5 consecutive 3x3 conv layers, I should keep multiplying the previous number of outputs with the number of filters in that layer like this:

weights = {
    'wc1': tf.Variable(tf.random_normal([3, 3, 1, 20])),
    'wc2': tf.Variable(tf.random_normal([3, 3, 20, 41])),
    'wc3': tf.Variable(tf.random_normal([3, 3, 20*41, 41])),
    'wc4': tf.Variable(tf.random_normal([3, 3, 20*41*41, 62])),
    'wc5': tf.Variable(tf.random_normal([3, 3, 20*41*41*62, 83])),
    'out': tf.Variable(tf.random_normal([3, 3, 20*41*41*62*83, n_classes]))
}

It just feels like I'm doing something wrong.

Anton Panchishin · Accepted Answer

Yeah, you are doing something wrong.

Your input matrix is [batch,height,width,depth] where depth is initially 1.

Let's take a look at the wc1 as an example [3,3,1,20]. What this means that it will have 20 different filters, each filter will span 1 depth and cover a height x width of 3x3. Each filter will pass through the whole image spanning all the depth. Since there are 20 different filters that will create an output tensor of [batch,height,width,20]

Conceptually we've chance the depth which was pixel intensity to be instead 20 classes per 3x3 pixels around the previous pixel.

If we then apply [3, 3, 20, 41], we will create 41 filters, where each filter is a depth of 20 and a height x width of 3x3 that slides across all the height and width to generate each of the 41 different filters. The result is [batch,height,width,41], or 41 classes per pixel.

Your next transform is [3, 3, 20*41, 41] which is wrong. Do not have 20*41 depth, you have 41 depth.

Here is the update that you need:

weights = {
    'wc1': tf.Variable(tf.random_normal([3, 3, 1, 20])),
    'wc2': tf.Variable(tf.random_normal([3, 3, 20, 41])),
    'wc3': tf.Variable(tf.random_normal([3, 3, 41, 41])),
    'wc4': tf.Variable(tf.random_normal([3, 3, 41, 62])),
    'wc5': tf.Variable(tf.random_normal([3, 3, 62, 83])),
    'out': tf.Variable(tf.random_normal([1, 1, 83, n_classes]))
}

Depending on if you are doing max_pooling or not, padding or not, will determine the output shape after applying wc5.

If you apply a [1,2,2,1] max_pool after wc1, then the [height,width] reduces from [28,28] to [14,14].

If after wc2 there is another [1,2,2,1] max_pool then [height,width] reduces from [14,14] to [7,7].

7 isn't evenly divisible by 2. If wc3 is applied without padding then the [height,width] reduces from [7,7] to [5,5]. Doing the same for wc4 -> [3,3] and again for wc5 -> [1,1].

Finally out would be on a [batch,1,1,83] matrix which would transform it to a [batch,1,1,class] matrix!

Tensorflow CNN MNIST example, weight dimensions

Answers (1)

Related Questions