Reputation: 1
I have working instance segmentation, I'm using "mask_rcnn_R_101_FPN_3x" model. When I inference image it takes about 3 second / image on GPU. How can I speed up it faster ?
I code in Google Colab
This is my setup config:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.OUTPUT_DIR = "/content/drive/MyDrive/TEAM/save/"
cfg.DATASETS.TRAIN = (train_name,)
cfg.DATASETS.TEST = (test_name, )
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
This is inference:
torch.backends.cudnn.benchmark = True
start = time.time()
predictor = DefaultPredictor(cfg)
im = cv2.imread("/content/drive/MyDrive/TEAM/mcocr_val_145114ixmyt.jpg")
outputs = predictor(im)
print(f"Inference time per image is : {(time.time() - start)} s")
Return time:
Inference time per image is : 2.7835421562194824 s
Image I inference size 1024 x 1024 pixel. I have change different size but it still inference 3 second / image. Am I missing anything about Detectron2 ?
More information GPU enter image description here
Upvotes: 0
Views: 4277
Reputation: 1133
There is a third way. You could use a faster toolkit for the inference e.g. OpenVINO. OpenVINO is optimized specifically for Intel hardware but it should work with any CPU. It optimizes your model by converting to Intermediate Represantation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.
If you are able to export Detectron2 to ONNX model you can utilize OpenVINO. You can find a full tutorial on how to convert the ONNX model and performance comparison here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP, especially when you use Google Colab.
pip install openvino-dev[onnx]
Use Model Optimizer to convert ONNX model
The Model Optimizer is a command line tool which comes from OpenVINO Development Package. It converts the ONNX model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance (just change data_type). Run in command line:
mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.
Upvotes: 1
Reputation: 85
These are the two best ways to decrease inference time:
Decreasing the image size will not decrease the inference time because mask-rcnn has the same number of parameters no matter the size of the image - thus no change in inference time.
Upvotes: 1