bonobo
bonobo

Reputation: 132

Jupyter Notebook - Kernel dies during training - tensorflow-gpu 2.0, Python 3.6.8

Since I am kind of new in this field I tried following the official tutorial from tensorflow for predicting time series. https://www.tensorflow.org/tutorials/structured_data/time_series

Following problem occurs: -When training a multivariate model, after 2 or 3 epochs the kernel dies and restarts.

However this doesn't happen with a simpler univariate model, which has only one LSTM layer (not really sure if this makes a difference).

Second however, this problem just happened today. Yesterday the training of the multivariate model was possible and error-free.

As can be seen in the tutorial in the link below the model looks like this:

multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(32,return_sequences=True,input_shape=x_train_multi.shape[-2:]))

multi_step_model.add(tf.keras.layers.LSTM(16, activation='relu'))

multi_step_model.add(tf.keras.layers.Dense(72))

multi_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(clipvalue=1.0), loss='mae')

And the kernel dies after executing the following cell (usually after 2 or 3 epochs).

multi_step_history = multi_step_model.fit(train_data_multi, epochs=10,
                                          steps_per_epoch=300,
                                          validation_data=val_data_multi,
                                          validation_steps=50)

I have uninstalled and reinstalled tf, restarted my laptop, but nothing seems to work.

Any ideas?

OS: Windows 10 Surface Book 1

Upvotes: 0

Views: 1527

Answers (1)

bonobo
bonobo

Reputation: 132

Problem was a too big batch size. Reducing it from 1024 to 256 solved the crashing problem.

Solution taken from the comment of rbwendt on this thread on github.

Upvotes: 0

Related Questions