Reputation: 5791
I have pretty simple architecture lstm NN. After few epoch 1-2 my PC totally freezes I can't even move my mouse :
Layer (type) Output Shape Param #
=================================================================
lstm_4 (LSTM) (None, 128) 116224
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 98) 12642
=================================================================
Total params: 128,866
Trainable params: 128,866
Non-trainable params: 0
# Same problem with 2 layers LSTM with dropout and Adam optimizer
SEQUENCE_LENGTH =3, len(chars) = 98
model = Sequential()
model.add(LSTM(128, input_shape = (SEQUENCE_LENGTH, len(chars))))
#model.add(Dropout(0.15))
#model.add(LSTM(128))
model.add(Dropout(0.10))
model.add(Dense(len(chars), activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr=0.01), metrics=['accuracy'])
This is how I train:
history = model.fit(X, y, validation_split=0.20, batch_size=128, epochs=10, shuffle=True,verbose=2).history
NN needs 5 minutes to finish 1 epoch. Higher size of batch doesn't mean that problem will occur faster. But more complex model can train more time achieving almost same accuracy - about 0.46 (full code here )
I have last up to date Linux Mint, 1070ti with 8GB, 32Gb ram
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:08:00.0 On | N/A |
| 0% 35C P8 10W / 180W | 303MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Libraries:
Keras==2.2.0
Keras-Applications==1.0.2
Keras-Preprocessing==1.0.1
keras-sequential-ascii==0.1.1
keras-tqdm==2.0.1
tensorboard==1.8.0
tensorflow==1.0.1
tensorflow-gpu==1.8.0
I have tried limit GPU memory usage, but it can't be a problem here because during training it eats only 1 GB of gpu memory:
from keras.backend.tensorflow_backend
import set_session config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.allow_growth = True set_session(tf.Session(config=config))
What is wrong here? How can I fix the problem?
Upvotes: 5
Views: 4146
Reputation: 21
I had this exact problem. The computer died after about 15 minutes of training. I found that it was a memory SIMM card that died when it got warm / hot. If you have more than one SIMM card, you can take one out at a time and see if it is the culprit.
Upvotes: 2
Reputation: 5791
This is some kind of weird for me but problem was related with my new just april 2018 released CPU from AMD. So having up to date linux kernel was crucial: following this guide https://itsfoss.com/upgrade-linux-kernel-ubuntu/ I updated kernel from 4.13 to 4.17 - now everything works
UPD: The motherboard was crashing the system as well, I have changed it - now everythings works well
Upvotes: 0
Reputation: 757
tensorflow==1.0.1
first. Try installing the tensorflow-gpu==1.8.0
by building TensorFlow from sources as mentioned hereor
LSTM
with CuDNNLSTM
while training model on GPU. Later load the trained model weights into same model architecture with LSTM layer to use the model on CPU. (Make sure to use recurrent_activation='sigmoid'
in LSTM layer when re-loading CuDNNLSTM model weights!)Upvotes: 1