jshji
jshji

Reputation: 316

Keras' model.summary() not reflecting the size of the input layer?

In the example from 3b1b's video about Neural Network (the video), the model has 784 "neurons" in the input layer, followed by two 16-neuron dense layers, and a 10-neuron dense layer. (Please refer to the screenshot of the video provided below). This makes sense, because for example the first neuron in the input layer will have 16 'weights' (as in xw) so the number of weights is 784 * 16. And followed by 1616, and 16*10. There are also biases, which is same as the number of neurons in the dense layers. 3b1b says 13002!

Then I made the same model in Tensorflow, and the model.summary() shows the following:

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 784, 1)]          0         
                                                                 
 dense_8 (Dense)             (None, 784, 16)           32        
                                                                 
 dense_9 (Dense)             (None, 784, 16)           272       
                                                                 
 dense_10 (Dense)            (None, 784, 10)           170       
                                                                 
=================================================================
Total params: 474
Trainable params: 474
Non-trainable params: 0
_________________________________________________________________

Code used to produce the above:

#I'm using Keras through Julia so the code may look different?
input_shape = (784,1)
inputs = layers.Input(input_shape)
outputs = layers.Dense(16)(inputs)
outputs = layers.Dense(16)(outputs)
outputs = layers.Dense(10)(outputs)
model = keras.Model(inputs, outputs)
model.summary()

Which does not reflect the input shape at all? So I made another model with input_shape=(1,1), and I get the same Total Params:

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_10 (InputLayer)       [(None, 1, 1)]            0         
                                                                 
 dense_72 (Dense)            (None, 1, 16)             32        
                                                                 
 dense_73 (Dense)            (None, 1, 16)             272       
                                                                 
 dense_74 (Dense)            (None, 1, 10)             170       
                                                                 
=================================================================
Total params: 474
Trainable params: 474
Non-trainable params: 0
_________________________________________________________________

I don't think it's a bug, but I probably just don't understand what these mean / how Params are calculated.

Any help will be very appreciated.

Upvotes: 3

Views: 947

Answers (1)

AloneTogether
AloneTogether

Reputation: 26708

A Dense layer is applied to the last dimension of your input. In your case it is 1, instead of 784. What you actually want is:

import tensorflow as tf
input_shape = (784, )
inputs = tf.keras.layers.Input(input_shape)
outputs = tf.keras.layers.Dense(16)(inputs)
outputs = tf.keras.layers.Dense(16)(outputs)
outputs = tf.keras.layers.Dense(10)(outputs)
model = tf.keras.Model(inputs, outputs)
model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 784)]             0         
                                                                 
 dense_3 (Dense)             (None, 16)                12560     
                                                                 
 dense_4 (Dense)             (None, 16)                272       
                                                                 
 dense_5 (Dense)             (None, 10)                170       
                                                                 
=================================================================
Total params: 13,002
Trainable params: 13,002
Non-trainable params: 0
_________________________________________________________________

From the TF docs:

Note: If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 0 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).

Upvotes: 4

Related Questions