rahulgarg12342
rahulgarg12342

Reputation: 165

Tensorflow GPU error: Resource Exhausted in middle of training a model

I'm trying to train a model (implementation of a research paper) on K80 GPU with 12GB memory available for training. The dataset is about 23 GB and after data extraction, it shrinks to 12GB for the training script.

At about 4640th step (max_steps being 500,000), I receive the following error saying Resource Exhausted and the script stops soon after that. - Resource Exhausted Error

The memory usage at the beginning of the script is: enter image description here

I went through a lot of similar questions and found that reducing the batch-size might help but I have reduced the batch-size to 50 and the error persists. Is there any other solution except switching to a more powerful GPU?

Upvotes: 0

Views: 694

Answers (1)

Olivier Dehaene
Olivier Dehaene

Reputation: 1680

This does not look like a GPU Out Of Memory (OOM) error but more like you ran out of space on your local drive to save the checkpoint of your model.

Are you sure that you have enough space on your disk or that the folder you save to doesn't have a quotta?

Upvotes: 1

Related Questions