Reputation: 73
I have an encoder decoder network with : 3 BLSTMs in the encoder and 2 vanilla LSTMs in the decoder connected with a mutli head attention with 4 nodes. Latent dimension is 32 and my total sample looks like (10000,400,128). The encoder network has a dropout of 0.2 and the decoder has a dropout of 0.3. I'm using an adam optimizer with a learning rate of 0.001 and Mean Squared error loss. Finally I have a validation split of 0.3. I rented an Nvidia Titan V (with Core™ i9-9820X, 5.0/20 cores and 16/64 GB total effective shared RAM) on Vast.ai and it takes ~6 minutes for each epoch when I train it all together (7000 train and 3000 validation samples).
I was hoping to find ways of reducing the total train timing. Any suggestions would be great.
Upvotes: 0
Views: 2258
Reputation: 132
The first things that pop into mind are early stopping callbacks and change the batch size.
Although I haven't tried that on my own, batch normalization is considered to also make the training more efficient.
In my (not so relevant) case, I saw a great improvement in training speed and quality after normalizing the data. So, maybe data normalization/standardization could help a bit.
Last but not least, GRU networks tend to train faster, but in some cases they under-perform in relation to LSTM networks. I don't know if you are willing to change your model, but I thought I should mention this.
Upvotes: 1