kacpo1
kacpo1

Reputation: 565

How to improve YOLOv3 detection time? (OpenCV + Python)

I'm using YOLOv3 custom trained model with OpenCV 4.2.0 compiled with CUDA. When I'm testing code in Python I'm using OpenCV on GPU (GTX1050 Ti) but detection on single image (416px x 416px) takes 0.055 s (~20 FPS). My config file is set to small object detection, because I need to detect ~ 10px x 10px objects on 2500px x 2000px images so I split original image into 30 smaller pieces. My goal is to reach 0.013 s (~80 FPS) on 416px x 416px image. Is it possible in Python with OpenCV? If not, how to do it in proper way?

PS. Currently detection takes like 50% of CPU, 5GB RAM and 6% GPU.

Upvotes: 2

Views: 2740

Answers (1)

Venkatesh Wadawadagi
Venkatesh Wadawadagi

Reputation: 2943

Some of the preferred ways to improve detection time with already trained Yolov3 model are:

  • Quantisation: Run inference with INT8 instead of FP32. You can use this repo for this purpose.
  • Use Inference accelerator such as TensorRT since you're using Nvidia's GPU. The tool includes good amount of inference oriented optimisations along with quantisation optimisations INT8 and FP16 to reduce detection time. This thread talks about Yolov3 inference with TensorRT5. Use this repo for Yolov3 on TensorRT7.
  • Use inference library such as tkDNN, which is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards.

If you're open to do the model training there are few more options other than the ones mentioned above:

  • You can train the models with tinier versions rather than full Yolo versions, of course this comes at the cost of drop in accuracy/mAP. You can train tiny-yolov4 (latest model) or train tiny-yolov3.
  • Model Pruning - If you could rank the neurons in the network according to how much they contribute, you could then remove the low ranking neurons from the network, resulting in a smaller and faster network. Pruned yolov3 research paper and it's implementation. This is another pruned Yolov3 implementation.

Upvotes: 2

Related Questions