Andi Giga
Andi Giga

Reputation: 4162

TensorflowJs conv2d - Tensor Shapes

I want to create a machine learning model for audio files. I converted the audio files into a (spectrogram) tensor. My feature tensor (the audio files) has the following shape [119, 241, 125] (119 files, 241 samples/file, 125 frequencies/sample). By sample, I define the samples I took in a timespan e.g. 16ms. My output shape will be [119, numOptions].

I followed this tutorial from Tensorflow.js on audio recognition. They build this model:

Model

I reshape my features tensor to be 4D: this.features = this.features.reshape([this.features.shape[0],this.features.shape[1],this.features.shape[2],1])for the 2Dconv.

  buildModel() {
        const inputShape1 = [this.features.shape[1], this.features.shape[2],this.features.shape[3]];
        this.model = tfNode.sequential();
        // filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
        this.model.add(tfNode.layers.conv2d(
            {filters: 8, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape1}
        ));

        // see the image at a higher level, generalize it more, prevent overfit
        this.model.add(tfNode.layers.maxPooling2d(
            {poolSize: [2, 2], strides: [2, 2]}
        ));

        // filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
        const inputShape2 = [119,62,8];
        this.model.add(tfNode.layers.conv2d(
            {filters: 32, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape2}
        ));

        // see the image at a higher level, generalize it more, prevent overfit
        this.model.add(tfNode.layers.maxPooling2d(
            {poolSize: [2, 2], strides: [2, 2]}
        ));

        // filter to the image => feature extractor, edge detector, sharpener (depends on the models understanding)
        const inputShape3 = [58,30,32];
        this.model.add(tfNode.layers.conv2d(
            {filters: 32, kernelSize: [4, 2], activation: 'relu', inputShape: inputShape3}
        ));

        // see the image at a higher level, generalize it more, prevent overfit
        this.model.add(tfNode.layers.maxPooling2d(
            {poolSize: [2, 2], strides: [2, 2]}
        ));

        // 1D output, => final output score of labels
        this.model.add(tfNode.layers.flatten({}));

        // prevents overfitting, randomly set 0
        this.model.add(tfNode.layers.dropout({rate: 0.25}));

        // learn anything linear, non linear comb. from conv. and soft pool
        this.model.add(tfNode.layers.dense({units: 2000, activation: 'relu'}));

        this.model.add(tfNode.layers.dropout({rate: 0.25}));

        // give probability for each label
        this.model.add(tfNode.layers.dense({units: this.labels.shape[1], activation: 'softmax'}));

        this.model.summary();

        // compile the model
        this.model.compile({loss: 'meanSquaredError', optimizer: 'adam'});
        this.model.summary()
    };

Model summary:

_________________________________________________________________
Layer (type)                 Output shape              Param #   
=================================================================
conv2d_Conv2D1 (Conv2D)      [null,238,124,8]          72        
_________________________________________________________________
max_pooling2d_MaxPooling2D1  [null,119,62,8]           0         
_________________________________________________________________
conv2d_Conv2D2 (Conv2D)      [null,116,61,32]          2080      
_________________________________________________________________
max_pooling2d_MaxPooling2D2  [null,58,30,32]           0         
_________________________________________________________________
conv2d_Conv2D3 (Conv2D)      [null,55,29,32]           8224      
_________________________________________________________________
max_pooling2d_MaxPooling2D3  [null,27,14,32]           0         
_________________________________________________________________
flatten_Flatten1 (Flatten)   [null,12096]              0         
_________________________________________________________________
dropout_Dropout1 (Dropout)   [null,12096]              0         
_________________________________________________________________
dense_Dense1 (Dense)         [null,2000]               24194000  
_________________________________________________________________
dropout_Dropout2 (Dropout)   [null,2000]               0         
_________________________________________________________________
dense_Dense2 (Dense)         [null,2]                  4002      
=================================================================
Total params: 24208378
Trainable params: 24208378
Non-trainable params: 0
_________________________________________________________________
    Epoch 1 / 10
eta=0.0 ======================================>----------------------------------------------------------------------------- loss=0.515 0.51476
eta=0.8 ============================================================================>--------------------------------------- loss=0.442 0.44186
eta=0.0 ===================================================================================================================> 
3449ms 32236us/step - loss=0.485 val_loss=0.958 
Epoch 2 / 10
eta=0.0 ======================================>----------------------------------------------------------------------------- loss=0.422 0.42188
eta=0.9 ============================================================================>--------------------------------------- loss=0.395 0.39535
eta=0.0 ===================================================================================================================> 
3643ms 34043us/step - loss=0.411 val_loss=0.958 
Epoch 3 / 10

1) The first input size is my features tensor shape. The other two inputShapes (inputShape2, inputShape3) where defined by the error message I got. How to determine the following two input sizes in advance?

Upvotes: 1

Views: 563

Answers (1)

edkeveked
edkeveked

Reputation: 18381

How the inputShape is calculated ?

It is not the inputShape that is calculated. It is the dataset that is passed to the model that has to match the inputShape. While defining the model, the inputShape is of 3D. But looking at the model summary, there is a fourth dimension with value null that is the batchshape. As a result, the training data should be of 4D. The first dimension or batchshape can be whatever - what matters is for the features and the labels to have the same batchshape. There is a more detailed answer here

How the layers shape is calculated ?

It depends of the layers used. Layers such as dropout, activation don't change the input shape.

  • Depending on the stride kernel, the convolution layer will change the input shape. This answer details how it is calculated.

  • A flatten layer will simply reshape the inputShape to be of one dimension. In the model summary, there is the input shape [null,27,14,32] and the flatten layer has the shape [null, 12096] (12096 = 27 * 14 *32)

  • The dense layer will also change the input shape. The shape of the dense layer depends of the number of units of that layer.

Upvotes: 1

Related Questions