Reputation: 357
I have this opencv image processing function being called 4x on 4 diferent Mat objects.
void processBinary(Mat& binaryMat) {
//image processing
}
I want to multi-thread it so that all 4 method calls complete at the same time, but have the main thread wait until each thread is done.
Ex:
int main() {
Mat m1, m2, m3, m4;
//perform each of these methods simultaneously, but have main thread wait for all processBinary() calls to finish
processBinary(m1);
processBinary(m2);
processBinary(m3);
processsBinary(m4);
}
What I hope to accomplish is to be able to call processBinary() as many times as I need and have the same efficiency as having the method called only once. I have looked up multithreading, but am a little confused on calling threads and then joining / detaching them. I believe I need to instantiate each thread and then call join() on each thread so that the main thread waits for each to execute, but there doesn't seem to be a significant increase in execution time. Can anyone explain how I should go about multi-threading my program? Thanks!
EDIT: What I have tried:
//this does not significantly increase execution time. However, calling processBinary() only once does.4
thread p1(&Detector::processBinary, *this, std::ref(m1));
thread p2(&Detector::processBinary, *this, std::ref(m2));
thread p3(&Detector::processBinary, *this, std::ref(m3));
thread p4(&Detector::processBinary, *this, std::ref(m4));
p1.join();
p2.join();
p3.join();
p4.join();
Upvotes: 5
Views: 10499
Reputation: 1402
In case, you're using python language, then you can use my powerful open-source built-in multi-threaded vidgear OpenCV's wrapper python library available on GitHub and PyPI for achieving higher FPS.
VidGear is a lightweight python wrapper around OpenCV Video I/O module that contains powerful multi-thread modules(gears) to enable high-speed video frames capture functionality across various devices and platforms.
Key features which differentiate it from the other existing multi-threaded open source solutions are:
Multi-Threaded high-speed OpenCV video-frame capturing(resulting in High FPS)
Flexible Direct control over the video stream with easy manipulation ability
Lightweight
Built-in Robust Error and frame synchronization Handling
Multi-Platform compatibility(Compatible with Raspberry Pi Camera also.)
Full Support for Network Video Streams(Including Gstreamer Raw Video Capture Pipeline)
Upvotes: 0
Reputation: 52397
The slick way to achieve this is not to do the thread housekeeping yourself but use a library that provides micro-parallelization.
OpenCV itself uses Intel Thread Building Blocks (TBB) for exactly this task -- running loops in parallel.
In your case, your loop has just four iterations. With C++11, you can write it down very easily using a lambda expression. In your example:
std::vector<cv::Mat> input = { m1, m2, m3, m4; }
tbb::parallel_for(size_t(0), input.size(), size_t(1), [=](size_t i) {
processBinary(input[i]);
});
For this example I took code from here.
Upvotes: 6