user17800083
user17800083

Reputation:

Very strange keras behaviour - Better loss with an error in the code

I accidentally forgot to the change the variable input to x at the Conv1D call function. But when I train with that model the loss is far better then when I fix the error.

The model with the error (scroll to the right).

inputs = keras.layers.Input(shape=self.input)
concat = []
for _ in range(4):
    x = keras.layers.Conv1D(32, kernel_size=3, strides=1, dilation_rate=1, padding="same", activation="relu", use_bias=False)(inputs)
    x = keras.layers.Conv1D(64, kernel_size=3, strides=1, dilation_rate=1, padding="same", activation="relu", use_bias=False)(inputs) # <-- should be Conv1D(...)(x)
    x = keras.layers.Conv1D(128, kernel_size=3, strides=1, dilation_rate=1, padding="same", activation="relu", use_bias=False)(inputs) # <-- should be Conv1D(...)(x)
    x = keras.layers.LSTM(32, activation="sigmoid", return_sequences=True)(x)
    x = keras.layers.LSTM(32, activation="sigmoid", return_sequences=False)(x)
    concat.append(x)
x = keras.layers.Concatenate(axis=1)(concat)
x = keras.layers.Dense(128, activation="relu")(x)
x = keras.layers.Dense(128, activation="relu")(x)
outputs = keras.layers.Dense(self.output)(x)
self.model = keras.models.Model(inputs=inputs, outputs=outputs)

The model summary & training of the model with the error (scroll down).

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 24, 8)]      0                                            
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_8 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_11 (Conv1D)              (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     (None, 24, 32)       20608       conv1d_2[0][0]                   
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 24, 32)       20608       conv1d_5[0][0]                   
__________________________________________________________________________________________________
lstm_4 (LSTM)                   (None, 24, 32)       20608       conv1d_8[0][0]                   
__________________________________________________________________________________________________
lstm_6 (LSTM)                   (None, 24, 32)       20608       conv1d_11[0][0]                  
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 32)           8320        lstm[0][0]                       
__________________________________________________________________________________________________
lstm_3 (LSTM)                   (None, 32)           8320        lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_5 (LSTM)                   (None, 32)           8320        lstm_4[0][0]                     
__________________________________________________________________________________________________
lstm_7 (LSTM)                   (None, 32)           8320        lstm_6[0][0]                     
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 128)          0           lstm_1[0][0]                     
                                                                 lstm_3[0][0]                     
                                                                 lstm_5[0][0]                     
                                                                 lstm_7[0][0]                     
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          16512       concatenate[0][0]                
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          16512       dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 1)            129         dense_1[0][0]                    
==================================================================================================
Total params: 161,153
Trainable params: 161,153
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/250
628/628 [==============================] - 14s 16ms/step - loss: 1.0818 - precision: 0.5038 - val_loss: 1.0670 - val_precision: 0.5293
Epoch 2/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0783 - precision: 0.5250 - val_loss: 1.0668 - val_precision: 0.5254
Epoch 3/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0769 - precision: 0.5352 - val_loss: 1.0665 - val_precision: 0.5229
Epoch 4/250
628/628 [==============================] - 9s 15ms/step - loss: 1.0762 - precision: 0.5357 - val_loss: 1.0653 - val_precision: 0.5291
Epoch 5/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0755 - precision: 0.5358 - val_loss: 1.0660 - val_precision: 0.5163
Epoch 6/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0738 - precision: 0.5378 - val_loss: 1.0640 - val_precision: 0.5260
Epoch 7/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0727 - precision: 0.5384 - val_loss: 1.0634 - val_precision: 0.5257
Epoch 8/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0706 - precision: 0.5380 - val_loss: 1.0616 - val_precision: 0.5306
Epoch 9/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0692 - precision: 0.5471 - val_loss: 1.0599 - val_precision: 0.5375
Epoch 10/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0684 - precision: 0.5467 - val_loss: 1.0583 - val_precision: 0.5435
Epoch 11/250
628/628 [==============================] - 9s 15ms/step - loss: 1.0665 - precision: 0.5534 - val_loss: 1.0577 - val_precision: 0.5486
Epoch 12/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0658 - precision: 0.5487 - val_loss: 1.0623 - val_precision: 0.5472
Epoch 13/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0642 - precision: 0.5513 - val_loss: 1.0569 - val_precision: 0.5488
Epoch 14/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0634 - precision: 0.5530 - val_loss: 1.0571 - val_precision: 0.5347
Epoch 15/250
628/628 [==============================] - 9s 15ms/step - loss: 1.0622 - precision: 0.5506 - val_loss: 1.0538 - val_precision: 0.5445
Epoch 16/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0607 - precision: 0.5527 - val_loss: 1.0537 - val_precision: 0.5489
Epoch 17/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0594 - precision: 0.5526 - val_loss: 1.0550 - val_precision: 0.5450
Epoch 18/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0583 - precision: 0.5544 - val_loss: 1.0566 - val_precision: 0.5461
Epoch 19/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0556 - precision: 0.5571 - val_loss: 1.0521 - val_precision: 0.5405
Epoch 20/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0545 - precision: 0.5600 - val_loss: 1.0524 - val_precision: 0.5480
Epoch 21/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0532 - precision: 0.5611 - val_loss: 1.0487 - val_precision: 0.5467
Epoch 22/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0520 - precision: 0.5603 - val_loss: 1.0522 - val_precision: 0.5496
Epoch 23/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0508 - precision: 0.5583 - val_loss: 1.0494 - val_precision: 0.5497
Epoch 24/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0480 - precision: 0.5630 - val_loss: 1.0461 - val_precision: 0.5489
Epoch 25/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0463 - precision: 0.5617 - val_loss: 1.0461 - val_precision: 0.5505
Epoch 26/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0457 - precision: 0.5643 - val_loss: 1.0449 - val_precision: 0.5548
Epoch 27/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0430 - precision: 0.5659 - val_loss: 1.0472 - val_precision: 0.5504
Epoch 28/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0426 - precision: 0.5679 - val_loss: 1.0415 - val_precision: 0.5516
Epoch 29/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0389 - precision: 0.5679 - val_loss: 1.0459 - val_precision: 0.5542
Epoch 30/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0379 - precision: 0.5709 - val_loss: 1.0421 - val_precision: 0.5583
Epoch 31/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0366 - precision: 0.5723 - val_loss: 1.0423 - val_precision: 0.5586
Epoch 32/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0335 - precision: 0.5765 - val_loss: 1.0415 - val_precision: 0.5573
Epoch 33/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0318 - precision: 0.5772 - val_loss: 1.0399 - val_precision: 0.5580
Epoch 34/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0287 - precision: 0.5789 - val_loss: 1.0423 - val_precision: 0.5495
Epoch 35/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0276 - precision: 0.5862 - val_loss: 1.0354 - val_precision: 0.5658
Epoch 36/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0252 - precision: 0.5841 - val_loss: 1.0321 - val_precision: 0.5619
Epoch 37/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0233 - precision: 0.5861 - val_loss: 1.0348 - val_precision: 0.5651
Epoch 38/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0215 - precision: 0.5876 - val_loss: 1.0327 - val_precision: 0.5677
Epoch 39/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0187 - precision: 0.5905 - val_loss: 1.0350 - val_precision: 0.5699
Epoch 40/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0158 - precision: 0.5938 - val_loss: 1.0301 - val_precision: 0.5702
Epoch 41/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0154 - precision: 0.5955 - val_loss: 1.0291 - val_precision: 0.5671
Epoch 42/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0099 - precision: 0.5972 - val_loss: 1.0328 - val_precision: 0.5786
Epoch 43/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0076 - precision: 0.5996 - val_loss: 1.0327 - val_precision: 0.5712
Epoch 44/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0030 - precision: 0.6066 - val_loss: 1.0231 - val_precision: 0.5708
Epoch 45/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9996 - precision: 0.6047 - val_loss: 1.0276 - val_precision: 0.5728
Epoch 46/250
628/628 [==============================] - 12s 19ms/step - loss: 0.9965 - precision: 0.6072 - val_loss: 1.0206 - val_precision: 0.5744
Epoch 47/250
628/628 [==============================] - 11s 18ms/step - loss: 0.9910 - precision: 0.6134 - val_loss: 1.0182 - val_precision: 0.5837
Epoch 48/250
628/628 [==============================] - 10s 16ms/step - loss: 0.9865 - precision: 0.6114 - val_loss: 1.0204 - val_precision: 0.5750
Epoch 49/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9808 - precision: 0.6155 - val_loss: 1.0251 - val_precision: 0.5745
Epoch 50/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9773 - precision: 0.6129 - val_loss: 1.0147 - val_precision: 0.5877
Epoch 51/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9737 - precision: 0.6184 - val_loss: 1.0073 - val_precision: 0.5871
Epoch 52/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9696 - precision: 0.6174 - val_loss: 1.0078 - val_precision: 0.5807
Epoch 53/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9631 - precision: 0.6265 - val_loss: 1.0015 - val_precision: 0.5927
Epoch 54/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9618 - precision: 0.6216 - val_loss: 1.0064 - val_precision: 0.5916
Epoch 55/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9571 - precision: 0.6246 - val_loss: 1.0127 - val_precision: 0.5907
Epoch 56/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9588 - precision: 0.6251 - val_loss: 1.0012 - val_precision: 0.5903
Epoch 57/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9499 - precision: 0.6297 - val_loss: 1.0192 - val_precision: 0.5824
Epoch 58/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9471 - precision: 0.6273 - val_loss: 1.0103 - val_precision: 0.5893
Epoch 59/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9428 - precision: 0.6367 - val_loss: 0.9949 - val_precision: 0.5943
Epoch 60/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9366 - precision: 0.6348 - val_loss: 0.9926 - val_precision: 0.5946
Epoch 61/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9356 - precision: 0.6356 - val_loss: 0.9868 - val_precision: 0.6016
Epoch 62/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9280 - precision: 0.6385 - val_loss: 0.9902 - val_precision: 0.5949
Epoch 63/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9255 - precision: 0.6403 - val_loss: 0.9877 - val_precision: 0.5957
Epoch 64/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9217 - precision: 0.6425 - val_loss: 1.0087 - val_precision: 0.5918
Epoch 65/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9156 - precision: 0.6460 - val_loss: 1.0007 - val_precision: 0.5961
Epoch 66/250
628/628 [==============================] - 10s 15ms/step - loss: 0.9155 - precision: 0.6454 - val_loss: 0.9873 - val_precision: 0.5965
09-01-22 15:18:32 - Saving model weights to /vserver/storages/packages/trader/.cache/weights/linear/neuralnet.1.1.14.h5 ... done
09-01-22 15:18:32 - Trained the model in 10.7m.
Training evaluation: loss: 0.9263 - precision: 0.6409
Validation evaluation: loss: 0.9868 - precision: 0.6016

Now the model with the error fixed.

inputs = keras.layers.Input(shape=self.input)
concat = []
for _ in range(4):
    x = keras.layers.Conv1D(128, kernel_size=3, strides=1, dilation_rate=1, padding="same", activation="relu", use_bias=False)(inputs)
    x = keras.layers.LSTM(32, activation="sigmoid", return_sequences=True)(x)
    x = keras.layers.LSTM(32, activation="sigmoid", return_sequences=False)(x)
    concat.append(x)
x = keras.layers.Concatenate(axis=1)(concat)
x = keras.layers.Dense(128, activation="relu")(x)
x = keras.layers.Dense(128, activation="relu")(x)
outputs = keras.layers.Dense(self.output)(x)
self.model = keras.models.Model(inputs=inputs, outputs=outputs)

The model summary and training from the model with the error fixed (scroll down).

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 24, 8)]      0                                            
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 24, 128)      3072        input_1[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     (None, 24, 32)       20608       conv1d[0][0]                     
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 24, 32)       20608       conv1d_1[0][0]                   
__________________________________________________________________________________________________
lstm_4 (LSTM)                   (None, 24, 32)       20608       conv1d_2[0][0]                   
__________________________________________________________________________________________________
lstm_6 (LSTM)                   (None, 24, 32)       20608       conv1d_3[0][0]                   
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 32)           8320        lstm[0][0]                       
__________________________________________________________________________________________________
lstm_3 (LSTM)                   (None, 32)           8320        lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_5 (LSTM)                   (None, 32)           8320        lstm_4[0][0]                     
__________________________________________________________________________________________________
lstm_7 (LSTM)                   (None, 32)           8320        lstm_6[0][0]                     
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 128)          0           lstm_1[0][0]                     
                                                                 lstm_3[0][0]                     
                                                                 lstm_5[0][0]                     
                                                                 lstm_7[0][0]                     
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          16512       concatenate[0][0]                
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          16512       dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 1)            129         dense_1[0][0]                    
==================================================================================================
Total params: 161,153
Trainable params: 161,153
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/250
628/628 [==============================] - 14s 16ms/step - loss: 1.0800 - precision: 0.5006 - val_loss: 1.0678 - val_precision: 0.5036
Epoch 2/250
628/628 [==============================] - 9s 15ms/step - loss: 1.0792 - precision: 0.4970 - val_loss: 1.0678 - val_precision: 0.5091
Epoch 3/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0791 - precision: 0.4990 - val_loss: 1.0680 - val_precision: 0.4909
Epoch 4/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0791 - precision: 0.5016 - val_loss: 1.0683 - val_precision: 0.4909
Epoch 5/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0791 - precision: 0.5018 - val_loss: 1.0678 - val_precision: 0.5091
Epoch 6/250
628/628 [==============================] - 10s 15ms/step - loss: 1.0790 - precision: 0.4996 - val_loss: 1.0678 - val_precision: 0.5091
09-01-22 15:04:12 - Saving model weights to /vserver/storages/packages/trader/.cache/weights/linear/neuralnet.1.1.14.h5 ... done
09-01-22 15:04:12 - Trained the model in 1.0m.
Training evaluation: loss: 1.0788 - precision: 0.5161
Validation evaluation: loss: 1.0678 - precision: 0.5036

As you can see the model with the error performs far better then without. While they are technically identical. I have tested it multiple times and it remains the same. How can this be possible?

Edit: the number of epochs are different because of the EarlyStopping callback.

Upvotes: 1

Views: 73

Answers (1)

Baran
Baran

Reputation: 131

These models are exactly the same as you can see it in the summary. Because at the first code (with an error), first 2 Conv1D layers are not connected to the model, this makes them both identical. So what makes the difference between those models results. It is because of epoch numbers and initial weights.

Initial weights are the random weights of the model at the initial state. So when you create 2 model which are identical in layers and test them on a validation set without training, they will give different results. The reason for this is difference between the initial weights.(Weights are deciding what will be the output of model).

When it comes to epochs there could be local minima problem. Model can think that best weights are founded because there is a local minima. I think this is what makes it stopped at 6th epoch. You can google local minima if you don't know about it. But it shouldn't be stuck to local minima at every run (because initial weights are changes every run). So problem should be fixed if you run the model few times.

Upvotes: 1

Related Questions