Reputation: 1851
My training jobs only run for a minute or two so I have increased the resource limit so I can run a large number (500) in parallel. However, I would like to set some upper bound so I don't accidentally have them run for several hours times 500....
From the documentation I can find the following
Maximum run time for a hyperparameter tuning job: 30 days
30 days is def too much lol but how can I change it? Would love to just be able to set it to stop if it hits a maximum total training time, but unlike the other limits there's no mention that this can changed.
Upvotes: 0
Views: 749
Reputation: 5578
While there's no Tuner parameter that limits the tuner job duration, you could set an effective $ spend limit using the Tuner's max_jobs parameter:
allowed_spend_usd = 50 # 50$
instance_cost_usd_hr = 0.1
total_train_minutes_allowed = allowed_spend_usd * 60 / instance_cost_usd_hr
minutes_per_job = 2 # you know this empirically
max_jobs = round(total_train_minutes_allowed / minutes_per_job)
###
tuner = HyperparameterTuner(max_jobs=max_jobs, ...)
I recommend that you also set a reasonable max_run per training job to further ensure that training jobs will finish as fast as you expect (say 300 seconds if you expect 60-120 seconds).
Upvotes: 1