Reputation: 6223
I've got a Keras model, with sizes as follows:
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
stft (InputLayer) (None, 1, 16384) 0
________________________________________________________________________________
static_stft (Spectrogram) (None, 1, 65, 256) 16640
________________________________________________________________________________
conv2d_1 (Conv2D) (None, 38, 5, 9) 12882
________________________________________________________________________________
dense_1 (Dense) (None, 38, 5, 512) 5120
________________________________________________________________________________
predictions (Dense) (None, 38, 5, 368) 188784
================================================================================
I'm confused about the dimensionality of the Dense layers at the end. I was hoping to have (None,512) and (None,368) respectively. This is suggested by answers like: Keras lstm and dense layer
They final dense layers are created as follows:
x = keras.layers.Dense(512)(x)
outputs = keras.layers.Dense(
368, activation='sigmoid', name='predictions')(x)
So why do they have more than 512 outputs? And how can I change this?
Upvotes: 0
Views: 419
Reputation: 1177
Depending on your application you could flatten after the Conv2D layer:
input_layer = Input((1, 1710))
x = Reshape((38, 5, 9))(input_layer)
x = Flatten()(x)
x = Dense(512)(x)
x = Dense(368)(x)
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) [(None, 1, 1710)] 0
_________________________________________________________________
reshape (Reshape) (None, 38, 5, 9) 0
_________________________________________________________________
flatten (Flatten) (None, 1710) 0
_________________________________________________________________
dense (Dense) (None, 512) 876032
_________________________________________________________________
dense_1 (Dense) (None, 368) 188784
Upvotes: 1
Reputation: 1665
It's the Conv2D
layer. The convolutional layer is producing 38x5 outputs of length 9, and then your Dense
layer is taking each of the 38x5 length 9 sequences as input and converting it to a length 512 sequence as output.
To get rid of the spatial dependence, you'll want to use something like a pooling layer, possibly a GlobalMaxPool2D
. This will consolidate the data into only the channel dimension, and produce a (None, 9)
shaped output, which will lead to your expected shapes from the Dense
layers.
Upvotes: 0