Vitor Py
Vitor Py

Reputation: 5180

Concurrency and optimization using OpenMP

I'm learning OpenMP. To do so, I'm trying to make an existing code parallel. But I seems to get an worse time when using OpenMP than when I don't.

My inner loop:

    #pragma omp parallel for
    for(unsigned long j = 0; j < c_numberOfElements; ++j)
    {
        //int th_id = omp_get_thread_num();
        //printf("thread %d, j = %d\n", th_id, (int)j);

        Point3D current;
        #pragma omp critical
        {
            current = _points[j];
        }

        Point3D next = getNext(current);

        if (!hasConstraint(next))
        {
            continue;
        }

        #pragma omp critical
        {
            _points[j] = next;
        }
    }

_points is a pointMap_t, defined as:

typedef boost::unordered_map<unsigned long, Point3D> pointMap_t;

Without OpenMP my running time is 44.904s. With OpenMP enabled, on a computer with two cores, it is 64.224s. What I am doing wrong?

Upvotes: 0

Views: 1136

Answers (4)

Jonathan Dursi
Jonathan Dursi

Reputation: 50927

I agree that it would be best to see some working code.

The ultimate issue here is that there are criticals within a parallel region, and criticals are (a) enormously expensive in and of themselves, and (b) by definition, kill parallelism. The assignment to current certainl doesn't need to be inside a critical, as it is private; I wouldn't have thought the _points[j] assignment would be, either, but I don't know what the map stuff does, so there you go.

But you have a loop in which you have a huge amount of overhead, which grows linearly in the number of threads (the two critical regions) in order to do a tiny amount of actual work (walk along a linked list, it looks like). That's never going to be a good trade...

Upvotes: 0

florin
florin

Reputation: 14326

You need to show the rest of the code. From a comment to another answer, it seems you are using a map. That is really a bad idea, especially if you are mapping 0..n numbers to values: why don't you use an array?

If you really need to use containers, consider using the ones from the Intel's Thread Building Blocks library.

Upvotes: 0

Steve Townsend
Steve Townsend

Reputation: 54128

It seems possible that the lookup and write to _points in critical sections is dragging down the performance when you use OpenMP. Single-threaded, this will not result in any contention.

Sharing seed data like this seems counterproductive in a parallel programming context. Can you restructure to avoid these contention points?

Upvotes: 0

High Performance Mark
High Performance Mark

Reputation: 78316

Why have you wrapped your reads and writes to _points[j] in critical sections ? I'm not much of a C++ programmer, but it doesn't look to me as if you need those sections at all. As you've written it (uunamed critical sections) each thread is going to wait while the other goes through each of the sections. This could easily make the program slower.

Upvotes: 1

Related Questions