OpenMP and explicit thread interoperability when using OpenCV

Question

I'm using OpenCV in a commercial application, and don't have management permission to purchase TBB licensing, so I built OpenCV with it OpenMP as the parallelism framework.

All the machine vision cameras we use as sources of frames we're processing in real-time have SDKs that fill frame buffers in a circular queue with data and call user-supplied callbacks to process them concurrently in threads of the SDKs' own thread pools.

This works fine when not considering OpenMP, as I'm doing a bunch of (memoryless) processing on individual frames before serializing them though interthread buffers to feed to the stateful processing stage where frames need to be processed in order. If it was just the concurrent frame processing, then I wouldn't need OpenMP at all; however, I need to leave it enabled in OpenCV so that the in-order frame processing is accelerated as well.

The concern I have is how well I can expect the OpenMP to work when it's used in the first phase, the concurrently executed callbacks in the threads created explicitly by the camera SDKs. Can I assume the OpenMP runtime is smart enough to use its thread pool in an efficient manner when there are parallel regions being triggered in multiple externally created threads?

The platform is guaranteed to be x86-64 (VC++15 or GCC).‎

bazza · Accepted Answer

Situation

If I've understood the question properly, the camera library you're using will spawn a number of threads, and each one of those will call your callback function. Inside your callback you want to use OpenMP to accelerate that processing. The results of that are sent through some interthread channel to a pipeline of threads doing more processing.

If that's wrong, please ignore the rest of this answer!

Rest Of Answer

Using OpenMP in the callbacks would seem to be chopping the compute load of this part of your application up into little pieces for not much benefit. The camera library is already overlapping the processing of frames. Using OpenMP here is going to mean that the processing of frames doesn't actually overlap (but the camera library is still using multiple threads as if it does).

If it does still overlap, then logically speaking you haven't got enough cores in your system to keep up with the overall workload anyway (assuming your use of OpenMP resulted in all cores being maxed out processing a single frame)... I'm assuming that your system is succesfully keeping up with the flow of frames, and thereofre must have enough grunt to be able to do so.

So, I think it won't really be a question of will OpenMP be intelligent in its use of its threadpool; the threadpool will be dedicated to processing a single frame, and it will complete before the next frame arrives.

The non-overalapping does mean the latency is lower, which might be what you want. However you could achieve the same thing if the camera library used a single thread with your callback using OpenMP (and taking on the responsibility to complete before the next frame arrives). With less thread context switching going on it would even be a tiny bit quicker. So if you can stop the library spawning all those threads (maybe there's a config parameter, or environment variable, or some other part of its API), it might be worth it.

OpenMP and explicit thread interoperability when using OpenCV

Answers (2)

Related Questions