Cheetah
Cheetah

Reputation: 14431

Improving image processing speed

I am using C++ and OpenCV to process some images taken from a Webcam in realtime and I am looking to get the best speed I can from my system.

Other than changing the processing algorithm (assume, for now, that you can't change it). Is there anything that I should be doing to maximize the speed of processing?

I am thinking maybe Multithreading could help here but I'm ashamed to say I don't really know the ins and outs (although obviously I have used multithreading before but not in C++).

Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?...or would the management overhead of these threads negate it assuming that I am looking for a throughput of 20fps (I assume that will affect the answer you give as it should give you an indication of how much processing will be done per thread)

Would multithreading help here?

Are there any tips for increasing the speed of OpenCV specifically, or any pitfalls that I might be falling into that reduce speed.

Thanks.

Upvotes: 9

Views: 3595

Answers (6)

Velimir Mlaker
Velimir Mlaker

Reputation: 10985

As example code for multi-threaded image processing with OpenCV, you might want to check out my code:

https://github.com/vmlaker/sherlock-cpp

It's what I came up with wanting to take advantage of x-core CPU to improve performance of object detection. The detect program is basically a parallel algorithm that distributes tasks among multiple threads, a separate pipelined thread for every task:

  1. Allocation of frame memory and video capture.
  2. Object detection (one thread per each Haar classifier.)
  3. Augmenting output with detection result and displaying the frame.
  4. Memory deallocation.

With memory for every captured frame shared between all threads, I got great performance and CPU utilization.

Upvotes: 1

Martin James
Martin James

Reputation: 24907

If your threads can operate on different data, it would seem reasonable to thread it off, perhaps queueing each frame object to a thread pool. You may have to add sequence numbers to the frame objects to ensure that the processed frames emerging from the pool are delivered in the same order they went in.

Upvotes: 2

CapelliC
CapelliC

Reputation: 60034

The easier way, I think, could be pipelining frame operations.

You could work with a thread pool, allocating sequentially a frame memory buffer to the first available thread, to be released to pool when the algorithm step on the associated frame has completed.

This could leave practically unchanged your current (debugged :) algorithm, but will require substantially more memory for buffering intermediate results.

Of course, without details about your task, it's hard to say if this is appropriate...

Upvotes: 6

kmote
kmote

Reputation: 16785

Unless the particular algorithm you are using is already optimized for a multithreaded/parallel platform, throwing it at an x-core processor will do nothing for you. The algorithm has to be inherently threadable to benefit from multiple threads. But if it wasn't designed with that in mind, it would have to be altered. On the other hand, many image processing algorithms are "embarassingly-parallel", at least in concept. Can you share more details about the algorithm you have in mind?

Upvotes: 3

Jav_Rock
Jav_Rock

Reputation: 22245

There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:

"...by constructing a header for a part of another matrix. It can be a single row, single column, several rows, several columns, rectangular region in the matrix (called a minor in algebra) or a diagonal. Such operations are also O(1), because the new header will reference the same data. You can actually modify a part of the matrix using this feature, e.g."

// add 5-th row, multiplied by 3 to the 3rd row
M.row(3) = M.row(3) + M.row(5)*3;

// now copy 7-th column to the 1-st column
// M.col(1) = M.col(7); // this will not work
Mat M1 = M.col(1);
M.col(7).copyTo(M1);

Maybe you already knew this issue but I think it is important to highlight headers in openCV as an important and efficient coding tool.

Upvotes: 5

Jarred
Jarred

Reputation: 391

Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?

Yes, although it very heavily depends on the particular algorithm being used, as well as your skill in writing threaded code to handle things like synchronization. You didn't really provide enough detail to make a better assessment than that.

Some algorithms are extremely easy to parallelize, like ones that have the form:

for (i=0; i < DATA_SIZE; i++)
{
   output[i] = f(input[i]);
}

for some function f. These are known as embarassingly parallelizable; you can simply split the data into N blocks and have N threads process each block individually. Libraries like OpenMP make this kind of threading extremely simple.

Upvotes: 4

Related Questions