fdh
fdh

Reputation: 5354

Getting better performance using OpenCV?

I need real time processing, but the internal functions of OpenCV are not providing this. I am doing hand gesture recognition, and it works almost perfectly, except for the fact that the resulting output is VERY laggy and slow. I know that this isn't because of my algorithm but the processing times of OpenCV. Is there anything I can do to speed it up?

Ps: I don't want to use the IPP libraries so please don't suggest that. I need increased performance from OpenCV itself

Upvotes: 4

Views: 12758

Answers (3)

Nuzhny
Nuzhny

Reputation: 1927

I'm using some approaches:

  1. [Application level] For hardware with OpenCL support: from cv::Mat to cv::UMat and set cv::ocl::setUseOpenCL(true)
  2. [Library level] In OpenCV CMake use another parallel library: TBB may be better then openmp
  3. [Library level] In OpenCV CMake enable IPP support in OpenCV
  4. [Application level] Caching temporary results. Most functions in OpenCV makes check format and size of output arrays. So you can store all results as cv::Mat in privete members and on next frames OpenCV will not allocate and deallocate memory for they.
  5. [Library -> Application level] Put sources of bottle-neck OpenCV functions and apply for it punkt [4].

Upvotes: 0

VoteCoffee
VoteCoffee

Reputation: 5107

Steve-o's answer is good for optimizing your code efficiency. I recommend adding some logic to monitor execution times to help you identify where to spend efforts optimizing.

OpenCV logic for time monitoring (python):

startTime = cv.getTickCount()
# your code execution
time = (cv.getTickCount() - startTime)/ cv.getTickFrequency()

Boost logic for time monitoring:

boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
// do something time-consuming
boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();

boost::posix_time::time_duration timeTaken = end - start;
std::cout << timeTaken << std::endl;

How you configure your OpenCV build matters a lot in my experience. IPP isn't the only option to give you better performance. It really is worth kicking the tires on your build to get better hardware utilization.

The other areas to look at are CPU and memory utilization. If you watch your CPU and/or memory utilization, you'll probably find that 10% of your code is working hard and the rest of the time things are largely idle.

  • Consider restructuring your logic as a pipeline using threads so that you can process multiple images at once (if you're tracking and need the results of previous images, you need to break up your code into multiple segments such as preprocessing/analysis and use a std::queue to buffer between them, and imshow won't work from worker threads so you'll need to push result images into a queue and imshow from the main thread)
  • Consider using persistent/global objects for things like kernels/detectors that don't need to get recreated each time
  • Is your throughput slowing down the longer your program runs? You may need to look at how you are handling disposing of images/variables within the main loop's scope
  • Segmenting your code in functions makes it more readable, easier to benchmark, and descopes variables earlier (temporary Mat and results variables free up memory when descoped)
  • If you're doing low-level processing on Mat pixels where you iterate over a large portion of the image, use a single parallel for and avoid writing
  • Depending on how you are running your code, you may be able to disable debugging to get better performance
  • If you're streaming and dumping frames, prefer changing the camera settings to throttle the streaming rate instead of dumping frames
  • If you're converting from1 12 to 8 bits or only using a region of your image, prefer doing this at the camera hardware level

Here's an example of a parallel for loop:

cv::parallel_for_(cv::Range(0, img.rows * img.cols), [&](const cv::Range& range)
{
    for (int r = range.start; r < range.end; r++)
    {
        int x = r / img.rows;
        int y = r % img.rows;
        uchar pixelVal = img.at<uchar>(y, x);
        //do work here
    }
});

If you're hardware constrained (ie fully utilizing CPU and/or memory), then you need to look at priotizing your process/OS perfomance optimizations/freeing system resources/upgrading your hardware

  • Increase the priority of the process to be more greedy with respect to other programs running on the computer (in linux you have nice(int inc) in unistd.h, in windows SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS) in windows.h)
  • Optimize your power settings for maximum performance in general
  • Disable CPU core parking
  • Optimize your acquisition hardware settings (increase rx/tx buffers, etc) to offload work from your CPU

Upvotes: 1

Steve-o
Steve-o

Reputation: 12866

Traditional techniques for improving image analysis:

  1. Reduce the image to a monochrome sample.
  2. Reduce the range of samples, e.g. from 8-bit monochrome to 4-bit monochrome.
  3. Reduce the size of the image, e.g. 1024x1924 to 64x64.
  4. Reduce the frame rate, e.g 60fps to 5fps.
  5. Perform a higher level function to guess where the target area is with say a lower resolution, then perform the regular analysis on the cropped output, e.g. perform image recognition to locate the hand before determining the gesture.

Upvotes: 11

Related Questions