shashashamti2008
shashashamti2008

Reputation: 2337

How to eliminate the overhead of utilizing threadpools in nested for-loops

I am trying to multithread the p2-loop in the inner loop of the following nested loops in C++. I don't want the outer loop to be multithreaded due to shared data.

// OUTER LOOP
for ( int l = 0; l < NT; l++ ) {
    for ( int k = 0; k < NZ; k++ ) {
        for ( int j = 0; j < NY; j++ ) {
            for ( int i = 0; i < NX; i++ ) {
    
                // define pStart and pEnd
                // ...
                
                // INNER LOOP (a 16x16x16 loop and it takes 20us on a single thread)   
                for ( int p2 = p2Start; p2 < p2End; p2++ ) {
                    for ( int p1 = p1Start; p1 < p1End; p1++ ) {
                        for ( int p0 = p0Start; p0 < p0End; p0++ ) {            
                            // performing inner loop computation...
                            std::this_thread::sleep_for(std::chrono::microseconds(20));
                        }
                    }
                }

I have tried bshoshany/thread-pool:

BS::thread_pool pool;
BS::multi_future<void> loop_future = pool.submit_loop(p2Start, p2End, [&](int p2) {
    for ( int p1 = p1Start; p1 < p1End; p1++ ) {
        for ( int p0 = p0Start; p0 < p0End; p0++ ) {
            // performing inner loop computation...
            std::this_thread::sleep_for(std::chrono::microseconds(20));
        }
    }
});

as well as Intel TBB parallel_for and they both have a huge overhead and significantly increase 20us with no gained benefits.

tbb::parallel_for(p2Start, p2End, [&](int p2) {
    for ( int p1 = p1Start; p1 < p1End; p1++ ) {
        for ( int p0 = p0Start; p0 < p0End; p0++ ) {
            // performing inner loop computation...
            std::this_thread::sleep_for(std::chrono::microseconds(20));
        }
    }
});

Is there any way to efficiently use multiple threads for this task with very minimal overhead to gain speedup?

Upvotes: 0

Views: 54

Answers (0)

Related Questions