Why should I ever use concurrency::critical section in place of other locking mechanisms

Question

I'm using concurrency::parallel_for() in a Windows app (Visual Studio 2017) to divide up some work in a loop. Everything worked just fine but I was concerned about locking performance so i tried various methods: std::mutex, windows CRITICAL_SECTION, etc.

Then I tried concurrency::critical_section. The documentation makes it sound like it should be faster since it is aware of the concurrency runtime.

Nope. Not only is it not faster, in some cases it's downright dangerous. At first it was just blowing up my application. In the debugger I could see concurrency just creating infinite threads. When I changed the partitioner from the default to the static one, well things started working again but everything was much, much slower than with windows CRITICAL_SECTION or even std::mutex

I'm wondering if someone can explain to me either of the following

Why does my using concurrency::critical_section in my lambda (below) with the default partitioner make concurrency create infinite threads?
Why (even when I get it to work by using the static_partioner) is the parallel_for() so much slower with concurrency::critical_section than other locking mechanisms?
What is the use-case for concurrency::critical_section?

Here is my code

#include 

void nonlinearReconstruction(const std::vector& window,
                             const Rect& rect,
                             Image& normals)
{
    concurrency::critical_section mtx;

    // This lambda uses the critical section "mtx" to control 
    // access to the shared image data in "normals".  Read pixels,
    // does math on them, and then sets other pixels.

    const auto op =
    [&normals, cols, rect, &window, &mtx] (int linearix)
    {
        // Determine what indices to use.
        const auto r = static_cast(linearix / cols);
        const auto c = static_cast(linearix % cols);
        const auto r0 = r + rect.top();
        const auto c0 = c + rect.left();
        const auto c1 = std::max(c + rect.left() - 1, 0);
        const auto r1 = r0; 
        const auto r2 = std::max(r + rect.top() - 1, 0);
        const auto c2 = c + rect.left();

        // Lock critical section to access shared memory pixels in "normals"

        mtx.lock();
        const auto ninit   = normals.getpel(r0, c0).asArray();
        const auto npx     = normals.getpel(r1, c1).asArray();
        const auto npy     = normals.getpel(r2, c2).asArray();
        mtx.unlock();  

        // Do heavy duty math on these pixels.  I've left out the code but 
        // no locking of any kind is done.  Just math on local data. 

        // ... blah blah blah


        // Lock again to set the corrected pixel in shared memory

        mtx.lock();
        normals.setpel(
            r + rect.top(), 
            c + rect.left(), 
            NormalVector(ntemp[0], ntemp[1], ntemp[2]));

        // Unlock one final time.

        mtx.unlock();
    };

    // Now call the parallel_for loop with the lambda above.
    // This version causes infinite thread creation
    concurrency::parallel_for(0, (int)totalix, op);

    // This version works but performs much slower with the 
    // concurrency::critical_section than with std::mutex or 
    // Windows CRITICAL_SECTION

    //  concurrency::parallel_for(0, (int)totalix, op, concurrency::static_partitioner());
}

A few things I've checked:

I have verified that the code is not throwing any exceptions.
This code is not recursive. (I understand that concurrency::critical_section is not a recursive lock but neither is std::mutex and that works just fine).
I've even stepped through it and I control does leave the routine.

Why should I ever use concurrency::critical section in place of other locking mechanisms

Answers (0)

Related Questions