Ivan To
Ivan To

Reputation: 67

Training Multiple Models in Google Colab

I am building an ensemble of models and need to do hyperparameter selection on 126 different models ranging from linear models to keras model with each taking 4-18 hours to run.

I plan to do that on google colab as i do not have enough computing resources. Should i open 126 google accounts and train all of the models in parallel using 126 colab CPUs / GPUs. Or should i open 126 colab notebooks on the same account and run the models there. Will the 126 notebooks share the same resources or will each notebook have access to separate CPUs.

Upvotes: 3

Views: 4411

Answers (3)

Leonardo Emili
Leonardo Emili

Reputation: 401

Please note that even if Google Colab is free to use it does not provide you full access to resources with the base plan. For this reason, the second approach would not work since you got just a single high-end GPU and a single CPU-core. You could try creating multiple Google accounts but pay attention that according to Colab's policy, they encourage interactive sessions over batch ones, so you could end up with your training stopped after a few hours.

I personally suggest you try a different approach, maybe trying to use less computational power changing the model or introducing new factors (in this case I suspect that weight decay should help you to merge similar models into single ones).

Upvotes: 0

Jakob Schödl
Jakob Schödl

Reputation: 416

The amount of usable GPU and RAM on Colab is limited on Colab. You can try out how many scripts you can run at the same time and could start using other accounts after that. Note that inactive sessions in Colab will be closed.

I personally would try to find a way involving less computational power. Google Colab has got a limited amount of hardware available, and using it too much might result in other users not being able to use a GPU. Also, abusing its capacities could result in a ban for you.

Upvotes: 2

Andrew Holmgren
Andrew Holmgren

Reputation: 1275

That's not what colab is for. You could try the hyperparameter tuner in tensorflow instead https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html.

Upvotes: 0

Related Questions