Reputation: 9
I want to train rnn model(gru+lstm) My training model has 500000 English text and I want to train and evaluate the model with that data. As I check the time of training for one epoch with batch size of 20 is really high(6 hour)
Layer (type) Output Shape Param #
===============================================================
input_4 (InputLayer) [(None, 149, 1)] 0
gru_6 (GRU) (None, 169) 87204
dense_6 (Dense) (None, 128) 21760
repeat_vector_3 (RepeatVector) (None, 169, 128) 0
gru_7 (GRU) (None, 169, 128) 99072
time_distributed_3 (TimeDistributed) (None, 169, 165627) 21365883
===============================================================
Total params: 21,573,919
Trainable params: 21,573,919
Non-trainable params: 0
this is a summary of the neural network layers. I want to know if we have any approach for decreasing training time or not (without increasing memory or increasing batch size).
Upvotes: 0
Views: 626
Reputation: 1036
You can look into the use of mixed precision
, which can be used for both TensorFlow/Keras
as well as PyTorch
. By reducing the precision of your model, it requires less memory. For example, a model which only uses float16
(16-bits) instead of float32
(32-bits) requires half the memory, which is why you can usually double your batch size
and thereby speeding up your training significantly.
However, please bear in mind that using the wrong dtypes may not lead to any performance improvement - e.g. using bfloat16
on CPU
s without AMX
(Advanced Matrix Extensions) will not speed up your training and actually make your training slower! Therefore, you should thoroughly read up on the topic and use the provided links as a starting point.
Upvotes: 0