Reputation: 31
I am working on training a (small-scale) large language model and would like to parallelize the training on Google Colab. Specifically, I want to know if it's possible to utilize multiple TPUs or GPUs to speed up the training and handle large models more efficiently.
If possible, are there any online tutorials or open-source examples that demonstrate how to set this up?
I found a historical post saying it's impossible, Distributed training in Tensorflow using multiple GPUs in Google Colab Not sure if it's still like that after 4+ yrs.
Upvotes: 0
Views: 169
Reputation: 41
As mentioned in the old post you can't place the same model on many gpu instances. altough there is the concept of Federated Learning where you can train on multiple instances and aggregate from that. But I am not sure how this applies while training LLMs but its worth a try.
Upvotes: 2