erfan momeni
erfan momeni

Reputation: 9

How can we decrease the training time of neural networks without increasing memory?

I want to train rnn model(gru+lstm) My training model has 500000 English text and I want to train and evaluate the model with that data. As I check the time of training for one epoch with batch size of 20 is really high(6 hour)

Layer (type) Output Shape Param #

===============================================================

input_4 (InputLayer) [(None, 149, 1)] 0

gru_6 (GRU) (None, 169) 87204

dense_6 (Dense) (None, 128) 21760

repeat_vector_3 (RepeatVector) (None, 169, 128) 0

gru_7 (GRU) (None, 169, 128) 99072

time_distributed_3 (TimeDistributed) (None, 169, 165627) 21365883

===============================================================

Total params: 21,573,919

Trainable params: 21,573,919

Non-trainable params: 0


this is a summary of the neural network layers. I want to know if we have any approach for decreasing training time or not (without increasing memory or increasing batch size).

Upvotes: 0

Views: 626

Answers (1)

Simon David
Simon David

Reputation: 1036

You can look into the use of mixed precision, which can be used for both TensorFlow/Keras as well as PyTorch. By reducing the precision of your model, it requires less memory. For example, a model which only uses float16 (16-bits) instead of float32 (32-bits) requires half the memory, which is why you can usually double your batch size and thereby speeding up your training significantly.

However, please bear in mind that using the wrong dtypes may not lead to any performance improvement - e.g. using bfloat16 on CPUs without AMX (Advanced Matrix Extensions) will not speed up your training and actually make your training slower! Therefore, you should thoroughly read up on the topic and use the provided links as a starting point.

Upvotes: 0

Related Questions