Reputation: 518
I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.
To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.
TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.
Are there any other mature and fast inference engines or other techniques to speed up the inference?
Upvotes: 7
Views: 7742
Reputation: 518
As @kkHarshit already mentioned it is very hard to speed up a Mask R-CNN any further.
The fastest instance segmentation model that I found is YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS).
It's perfomance is worse than Mask R-CNN or Yolact even but still very good.
Upvotes: 0
Reputation: 1234
OpenCV 4.5.0 with DNN_BACKEND_CUDA
and DNN_TARGET_CUDA
/DNN_TARGET_CUDA_FP16
.
Mask RCNN with 1024 x 1024 input image
Device | FPS
------------------ | -------
GTX 1080 Ti (FP32) | 29
RTX 2080 Ti (FP16) | 60
FPS measured includes NMS but excludes other preprocessing and postprocessing. The network fully runs end-to-end on GPU.
Benchmark code: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf
Upvotes: 2
Reputation: 12837
It's almost impossible to get higher inference speed for Mask R-CNN on GTX 1080. You may check detectron2
by Facebook AI Research.
Otherwise, I'd suggest to use YOLACT
- (You Only Look At CoefficienTs), it can achieve real-time instance segmentation.
On the other hand, if you don't need instance segmentation, you can use YOLO, SSD, etc for object detection.
Upvotes: 3