Reputation: 45
Task: Mask RCNN train_shapes.ipynb tutorial. Training to segment different shapes in the artificially generated shapes dataset.
Problem: Matterport's Mask RCNN implementation doesnt work out of the box for this notebook.
Thing's I have tried:
Parameters I have set:
Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 128
IMAGE_META_SIZE 16
IMAGE_MIN_DIM 128
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [128 128 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME shapes
NUM_CLASSES 4
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (8, 16, 32, 64, 128)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 5
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 5
USE_MINI_MASK False
USE_RPN_ROIS True
VALIDATION_STEPS 5
WEIGHT_DECAY 0.0001
Implementation details:
Output:
Starting at epoch 0. LR=0.001
Checkpoint Path: /logs/shapes20211123T0437/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5 (Conv2D)
fpn_c4p4 (Conv2D)
fpn_c3p3 (Conv2D)
fpn_c2p2 (Conv2D)
fpn_p5 (Conv2D)
fpn_p2 (Conv2D)
fpn_p3 (Conv2D)
fpn_p4 (Conv2D)
rpn_model (Functional)
mrcnn_mask_conv1 (TimeDistributed)
mrcnn_mask_bn1 (TimeDistributed)
mrcnn_mask_conv2 (TimeDistributed)
mrcnn_mask_bn2 (TimeDistributed)
mrcnn_class_conv1 (TimeDistributed)
mrcnn_class_bn1 (TimeDistributed)
mrcnn_mask_conv3 (TimeDistributed)
mrcnn_mask_bn3 (TimeDistributed)
mrcnn_class_conv2 (TimeDistributed)
mrcnn_class_bn2 (TimeDistributed)
mrcnn_mask_conv4 (TimeDistributed)
mrcnn_mask_bn4 (TimeDistributed)
mrcnn_bbox_fc (TimeDistributed)
mrcnn_mask_deconv (TimeDistributed)
mrcnn_class_logits (TimeDistributed)
mrcnn_mask (TimeDistributed)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:102: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
super(SGD, self).__init__(name, **kwargs)
System harware specifications:
Software Specifications:
Questions:
Notebook: Colab notebook
Upvotes: 0
Views: 901
Reputation: 1840
I had the same issue. The fix of setting workers to 1 and disabling multi-processing didn't work. I found out that it was trying to use the CPU instead of GPU. The fix was to make sure CUDA was installed properly, or if on HPC doing something like module load cuda
on HPC and make sure you've provisioned a node with a GPU.
Upvotes: 0
Reputation: 930
Try this:
1- Inside the (mrcnn) folder open the file (model.py).
2- Change line 2362 from:
workers = multiprocessing.cpu_count()
to:
workers = 1
3- Change line 2374 from:
use_multiprocessing=True,
to:
use_multiprocessing=False,
Or you can try using this fork where I already did these changes. https://github.com/manasrda/Mask_RCNN This fixed a similar problem for me.
Upvotes: 1
Reputation: 55
The training hangs, and this is actually kind of a known issue. The fix is simple: Find the fit function in the model.py file (should be somewhere around line 2360-2370 in the TF2 project), and set the 'workers' argument to 1 and the 'use_multiprocessing' argument to False.
Upvotes: 1