Reputation: 14431
I am using C++ and OpenCV to process some images taken from a Webcam in realtime and I am looking to get the best speed I can from my system.
Other than changing the processing algorithm (assume, for now, that you can't change it). Is there anything that I should be doing to maximize the speed of processing?
I am thinking maybe Multithreading could help here but I'm ashamed to say I don't really know the ins and outs (although obviously I have used multithreading before but not in C++).
Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?...or would the management overhead of these threads negate it assuming that I am looking for a throughput of 20fps (I assume that will affect the answer you give as it should give you an indication of how much processing will be done per thread)
Would multithreading help here?
Are there any tips for increasing the speed of OpenCV specifically, or any pitfalls that I might be falling into that reduce speed.
Thanks.
Upvotes: 9
Views: 3595
Reputation: 10985
As example code for multi-threaded image processing with OpenCV, you might want to check out my code:
https://github.com/vmlaker/sherlock-cpp
It's what I came up with wanting to take advantage of x-core CPU to improve performance of object detection. The detect
program is basically a parallel algorithm that distributes tasks among multiple threads, a separate pipelined thread for every task:
With memory for every captured frame shared between all threads, I got great performance and CPU utilization.
Upvotes: 1
Reputation: 24907
If your threads can operate on different data, it would seem reasonable to thread it off, perhaps queueing each frame object to a thread pool. You may have to add sequence numbers to the frame objects to ensure that the processed frames emerging from the pool are delivered in the same order they went in.
Upvotes: 2
Reputation: 60034
The easier way, I think, could be pipelining frame operations.
You could work with a thread pool, allocating sequentially a frame memory buffer to the first available thread, to be released to pool when the algorithm step on the associated frame has completed.
This could leave practically unchanged your current (debugged :) algorithm, but will require substantially more memory for buffering intermediate results.
Of course, without details about your task, it's hard to say if this is appropriate...
Upvotes: 6
Reputation: 16785
Unless the particular algorithm you are using is already optimized for a multithreaded/parallel platform, throwing it at an x-core processor will do nothing for you. The algorithm has to be inherently threadable to benefit from multiple threads. But if it wasn't designed with that in mind, it would have to be altered. On the other hand, many image processing algorithms are "embarassingly-parallel", at least in concept. Can you share more details about the algorithm you have in mind?
Upvotes: 3
Reputation: 22245
There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:
"...by constructing a header for a part of another matrix. It can be a single row, single column, several rows, several columns, rectangular region in the matrix (called a minor in algebra) or a diagonal. Such operations are also O(1), because the new header will reference the same data. You can actually modify a part of the matrix using this feature, e.g."
// add 5-th row, multiplied by 3 to the 3rd row
M.row(3) = M.row(3) + M.row(5)*3;
// now copy 7-th column to the 1-st column
// M.col(1) = M.col(7); // this will not work
Mat M1 = M.col(1);
M.col(7).copyTo(M1);
Maybe you already knew this issue but I think it is important to highlight headers in openCV as an important and efficient coding tool.
Upvotes: 5
Reputation: 391
Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?
Yes, although it very heavily depends on the particular algorithm being used, as well as your skill in writing threaded code to handle things like synchronization. You didn't really provide enough detail to make a better assessment than that.
Some algorithms are extremely easy to parallelize, like ones that have the form:
for (i=0; i < DATA_SIZE; i++)
{
output[i] = f(input[i]);
}
for some function f. These are known as embarassingly parallelizable; you can simply split the data into N blocks and have N threads process each block individually. Libraries like OpenMP make this kind of threading extremely simple.
Upvotes: 4