Emil Nowosielski
Emil Nowosielski

Reputation: 53

Out of memory at every second trial using Ray Tune

I am tuning the hyperparameters using ray tune. The model is built in the tensorflow library, it occupies a large part of the available GPU memory. I noticed that every second call reports an out of memory error.It looks like the memory is being freed, you can see in the GPU memory usage graph, this is the moment between calls of consecutive trials, between which the OOM error occurred. I add that on smaller models I do not encounter this error and the graph looks the same.

How to deal with this out of memory error in every second trial ?

Memory usage graph

Upvotes: 5

Views: 2934

Answers (1)

richliaw
richliaw

Reputation: 2045

There's actually a utility that helps avoid this:

https://docs.ray.io/en/master/tune/api_docs/trainable.html#ray.tune.utils.wait_for_gpu

def tune_func(config):
    tune.utils.wait_for_gpu()
    train()

tune.run(tune_func, resources_per_trial={"GPU": 1}, num_samples=10)

Upvotes: 3

Related Questions