Kiran Roy
Kiran Roy

Reputation: 33

Running detectron2 with Cuda (4GB GPU)

I'm getting below error:

RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 4.00 GiB total capacity; 624.92 MiB already allocated; 2.02 GiB free; 720.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Trying to use detectron2 for custom object detection.

cfg = get_cfg()
    cfg.MODEL.DEVICE = "cuda"

    cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
    cfg.DATASETS.TRAIN = ("pan_train",)
    cfg.DATASETS.TEST = ()
    cfg.DATALOADER.NUM_WORKERS = 2
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
    cfg.SOLVER.IMS_PER_BATCH = 2
    cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
    cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
    cfg.SOLVER.STEPS = []         # do not decay learning rate
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 11  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
    # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
    trainer = DefaultTrainer(cfg) 
    trainer.resume_or_load(resume=False)
    trainer.train()

Ref. https://github.com/AarohiSingla/Detectron2-Tutorial/blob/main/Detectron_maskrcnn_custom_dataset_baloon.ipynb

Please guide on how to solve this error?

Upvotes: 0

Views: 2500

Answers (1)

ErnestBidouille
ErnestBidouille

Reputation: 1146

You should try to empty your cache with :

torch.cuda.empty_cache()
#
cfg = get_cfg()
cfg.MODEL.DEVICE = "cuda"
...

If you still get this error, try to decrease the batch size to reduce Memory usage

Upvotes: 1

Related Questions