CS Yang
CS Yang

Reputation: 21

Mask RCNN Resource exhausted (OOM) on my own dataset

Help needed for Mask RCNN Resource Exhausted -

H/W - i7-8700, 32G RAM, single ASUS ROG STRIX 1080ti (11GB)

Virtual env setup - tensorflow-gpu==1.5.0, python==3.6.6, Cuda==9.0.176, cudnn==7.2.1

image resolution - maximum width=900 pixels, maximum height=675pixels, minimum width=194 pixels, minimum height=150 pixels, 11 images for training

S/W - IMAGES_PER_GPU = 1 (in class xxConfig(Config), xxx.py), BACKBONE = "resnet50", POST_NMS_ROIS_TRAINING = 1000, POST_NMS_ROIS_INFERENCE = 500, IMAGE_RESIZE_MODE = "square", IMAGE_MIN_DIM = 400, IMAGE_MAX_DIM = 512, TRAIN_ROIS_PER_IMAGE = 100

What strange to me was, nvidia-smi showed < 300MB used for python, the terminal showed the following, however,

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3,3,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: fpn_p5/random_uniform/RandomUniform = RandomUniformT=DT_INT32, dtype=DT_FLOAT, seed=87654321, seed2=5038409, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

nvidia-smi

error-log when running the code

Upvotes: 0

Views: 2196

Answers (2)

CS Yang
CS Yang

Reputation: 21

After replacing cudnn 7.2.1 with 7.0.5, I am now able to train Mask-RCNN using 1080ti gpu without a resource exhausted (OOM) issue.

Upvotes: 0

Andreas Pasternak
Andreas Pasternak

Reputation: 1299

Tensorflow by default allocates all GPU memory. So if you only see 700 MB allocated in nvidia-smi then you have most likely set some option in Tensorflow to limit GPU memory such as:

config.gpu_options.allow_growth = True

or

config.gpu_options.per_process_gpu_memory_fraction = 0.4

Remove this options and try again. See also: https://www.tensorflow.org/guide/using_gpu

Upvotes: 2

Related Questions