Reputation: 889
Lets say we have two classes:
class A
{
void run_all()
{
#pragma omp task
f1a();
#pragma omp task
f1b();
}
void f1a()
{ /*some code*/ }
void f1b()
{ /*some code*/ }
}
class B
{
void run_all()
{
#pragma omp task
f2a();
#pragma omp task
f2b();
}
void f2a()
{ /*some code*/ }
void f2b()
{ /*some code*/ }
}
And then we have the following code that uses these classes:
int main()
{
A *a = new A();
B *b = new B();
#pragma omp parallel
{
#pragma omp single
{
a->run_all();
b->run_all();
}
}
}
I need to ensure that the four subfunctions are executed in this order: f1a(), f1b(), f2a(), f2b(). To accomplish this I would need to use task dependencies with the in/out/inout clauses. In other words the four subfunctions would need to look something like this:
#pragma omp task depend (inout: x)
fXa();
where they all utilize a common variable x. How could I define such a variable? Would it need to be a global variable in C++ or can it be done in some other way without resorting to the use of global variables? Do I need to define a dummy variable x to define the task dependency even if I am not planning on using the variable anywhere else? It seems to me a bit odd that a variable would need to be defined for such a purpose...
BTW: yes I know that this could be trivially done by removing all the omp pragmas to define a sequential problem, but this is not the answer I am looking for as I have multiple such sub-pieces of code that I want to run in parallel
Upvotes: 2
Views: 592
Reputation: 50518
Using a global variable for such a use is probably not a good idea since it would create an hidden dependency (ie. "spooky action at distance") if multiple instance of A
and B
exist. One way to fix that is by using an explicit object to share a dependency between A
and B
. This explicit object can contain the storage location required by OpenMP for the tasks to be serialized.
Here is the resulting code:
// Can contain multiple fields in the future in order to
// support more complex synchronizations patterns.
struct DepHandler
{
char seqTag;
};
class A
{
public:
A(DepHandler* dep) : m_dep(dep) { }
void run_all()
{
#pragma omp task depend(inout: m_dep->seqTag)
f1a();
#pragma omp task depend(inout: m_dep->seqTag)
f1b();
}
void f1a() { /*some code*/ }
void f1b() { /*some code*/ }
private:
DepHandler* m_dep;
};
class B
{
public:
B(DepHandler* dep) : m_dep(dep) { }
void run_all()
{
#pragma omp task depend(inout: m_dep->seqTag)
f2a();
#pragma omp task depend(inout: m_dep->seqTag)
f2b();
}
void f2a() { /*some code*/ }
void f2b() { /*some code*/ }
private:
DepHandler* m_dep;
};
int main()
{
DepHandler dep;
A a(&dep);
B b(&dep);
#pragma omp parallel
{
#pragma omp single
{
a.run_all();
b.run_all();
}
}
}
This seems a bit cumbersome but in fact there are many benefits of such an approach. Indeed, the approach works well even if A
and B
are defined in different compilation units (which is often the case in object-oriented projects). They could even be defined in two completely different projects compiled separately. More serialized tasks can be added in the future and possibly more tag (for example if some tasks can be split in smaller ones with a less strict execution ordering).
In fact, I would argue that if tasks need to be executed serially, this is likely because they share some thing in common (implicit data). Dependencies can sometimes be used to constrain the scheduling (for example when tasks require a lot of memory and cannot be executed simultaneously) but even in that case, using an explicit dependency object can help to maintain the code.
This dependency variable is actually useful to OpenMP runtimes. Runtimes can use the pointer of the provided variable to retrieved the dependent tasks efficiently. Runtimes do not even need to allocate any internal objects related to the task dependency nor to control the life time of such objects. The burden is left to the programmer which can often do that more efficiently than runtimes.
Note that some advanced OpenMP runtimes can track where storage locations point to in task dependency in order to choose on which place to schedule tasks. For example, it is often a bit more efficient to execute task working on the same storage location on places close to each other (to improve locality). Thus, providing the actual data on which tasks work on may help to get a better scheduling (e.g. on NUMA-aware OpenMP runtime).
Note that the dependency type mutexinoutset
could be useful if you want the tasks to be executed serially but you not care about the order.
Upvotes: 2