I have the following program structure: type(mytype) :: x(10) !$OMP PARALLEL DEFAULT(SHARED) !$OMP MASTER do i = 1, 10 call sub1(x(i)) ! some stuff that cannot be parallelized call sub2(x(i)) ! some stuff that can be parallelized once sub1 has completed !$OMP END MASTER !$OMP END PARALLEL [.....] subroutine sub2(x) type(mytype), intent(inout) :: x integer :: j, k integer, allocatable :: v(:) allocate( v(10000) ) v(:) = ... ! filling up v do j = 1, 1000 !$OMP TASK DEFAULT(NONE) SHARED(x) PRIVATE(k) FIRSTPRIVATE(j,v) !...working on x !$OMP END TASK end do end subroutine But it doesn't behave as expected: the master thread is exiting sub2() only after many of the 1000 tasks (which take several seconds each) have completed. Is it because it takes a lot of time to initialize the tasks (in particular duplicating the firstprivate arrays for each task) ? more annoying, as soon as the master thread is exiting sub2() , the program crashes in one of tasks with a memory error ("null pointer dereference or unaligned memory access" message in the debugger). I don't get why it happens, as: all the dummy arguments that are used in a task are declared as SHARED , and therefore should remain accessible as long as the program is in the parallel region all the local variables that are used in a task are declared as PRIVATE or FIRSTPRIVATE , and therefore should have the same lifetime as the task itself What am I missing here? The compiler is Intel ifort 2018

Reputation: 2688

OpenMP tasks crash once the "spawning routine" exits

I have the following program structure:

type(mytype) :: x(10)

!$OMP PARALLEL DEFAULT(SHARED)
!$OMP MASTER
do i = 1, 10
   call sub1(x(i)) ! some stuff that cannot be parallelized
   call sub2(x(i)) ! some stuff that can be parallelized once sub1 has completed
!$OMP END MASTER
!$OMP END PARALLEL

[.....]

subroutine sub2(x)
   type(mytype), intent(inout) :: x
   integer :: j, k
   integer, allocatable :: v(:)
   allocate( v(10000) ) 
   v(:) = ... ! filling up v
   do j = 1, 1000
      !$OMP TASK DEFAULT(NONE) SHARED(x) PRIVATE(k) FIRSTPRIVATE(j,v)
         !...working on x
      !$OMP END TASK
   end do
end subroutine

But it doesn't behave as expected:

the master thread is exiting sub2() only after many of the 1000 tasks (which take several seconds each) have completed. Is it because it takes a lot of time to initialize the tasks (in particular duplicating the firstprivate arrays for each task) ?
more annoying, as soon as the master thread is exiting sub2(), the program crashes in one of tasks with a memory error ("null pointer dereference or unaligned memory access" message in the debugger). I don't get why it happens, as:
- all the dummy arguments that are used in a task are declared as SHARED, and therefore should remain accessible as long as the program is in the parallel region
- all the local variables that are used in a task are declared as PRIVATE or FIRSTPRIVATE, and therefore should have the same lifetime as the task itself

What am I missing here?

The compiler is Intel ifort 2018

Upvotes: 0

Answers (2)

PierU

Reputation: 2688

According to the comments and some additional tests I made, it turns out that a routine where some tasks have been spawned must not be exited until all tasks are completed. A solution is to put a taskwait directive at the end of the routine, but it doesn't fit my original objective, where I wanted the master thread to execute all the iterations of sub1() while the tasks were being executed. I ended up with a solution that consist in enclosing the call to sub2()in a task: that way, the master thread can continue executing without waiting for sub2() to exit:

type(mytype) :: x(10)

!$OMP PARALLEL DEFAULT(SHARED)
!$OMP MASTER
do i = 1, 10
   call sub1(x(i)) ! some stuff that cannot be parallelized
   !$OMP TASK
   call sub2(x(i)) ! some stuff that can be parallelized once sub1 has completed
   !$OMP END TASK
!$OMP END MASTER
!$OMP END PARALLEL

[.....]

subroutine sub2(x)
   type(mytype), intent(inout) :: x
   integer :: j, k
   integer, allocatable :: v(:)
   allocate( v(10000) ) 
   v(:) = ... ! filling up v
   do j = 1, 1000
      !$OMP TASK DEFAULT(NONE) SHARED(x) PRIVATE(k) FIRSTPRIVATE(j,v)
         !...working on x
      !$OMP END TASK
   end do
   !$OMP TASKWAIT 
end subroutine

Update

After more tests and with the help of the other answers and comments I have now a better understanding of the problem.

First, the problem comes only from the shared variables, not from the firstprivate (and obviously not from the private) ones. Even if the tasks effectively starts after the exit of sub2(), the firstprivate variables are correctly initialized (it seems that the task scheduler builds the whole environment of the task when it creates it, not when it starts it).

More important, I am actually passing an array section, not a scalar, and this has some consequences. The original code was therefore actually more like this (focusing here on the shared variables):

type(mytype) :: x(100)

!$OMP PARALLEL DEFAULT(SHARED)
!$OMP MASTER
do i = 1, 100, 10
   call sub1( x(i:i+9) ) ! some stuff that cannot be parallelized
   call sub2( x(i:i+9) ) ! some stuff that can be parallelized once sub1 has completed
!$OMP END MASTER
!$OMP END PARALLEL

[.....]

subroutine sub2(x2)
   type(mytype), intent(inout) :: x2(:) ! assumed shape interface
   type(mytype), allocatable :: y(:) ! local array
   allocate( y(size(x)) ) 
   do j = 1, 1000
      !$OMP TASK DEFAULT(NONE) SHARED(x2,y)
         !...working on x2, y being a work array
      !$OMP END TASK
   end do
end subroutine

What happens with y(:) is easy: once sub2() exits, y is out of scope and any reference to it has chances to fail or result in garbage.

About x2(:): the dummy argument x2(:) is "assumed shape", which means that what is actually passed to sub2() is not a simple address (C pointer) but a (hidden) temporary array descriptor (which contains the sizes, the rank, the strides... in addition to the address). Once sub2() exits the address is still valid, but it can no longer be read because the array descriptor does no longer exist.

Replacing x2(:) by x2(*) (assumed size interface) or x2(n) (explicit shape interface) can solve the problem. These are the legacy interfaces of Fortran, and in this case only the base address is passed, not a descriptor. But it also depends on how the routine is called:

call sub2( x2(i:i+9) ) : what is passed is a contiguous array section. In this case the compilers generally just pass the address of x2(i) beyond the scene, so everything is OK
call sub2( x2(i) ) : the compilation would fail, because a scalar is passed while an array is expected.
call sub2( x2(i:i+9:3) ) : what is passed is a non-contiguous array section. But since both x(*)andx(n)assume a contiguous array, the compiler will create a temporary contiguous copy ofx2(i:i+9:3)and pass the address of this copy. Whensub2()` exits the temporary copy gets out of scope and the address gets useless.

Notes

Even if a contiguous section is passed, the compiler is in theory free to make a temporary copy anyway (and whichever the interface), which means that one can never be 100% sure...
With the x2(:) assumed shape interface, passing a whole array (call sub2( x2 )) is probably safer, as the compiler can pass the existing array descriptor instead of creating a temporary descriptor that is destroyed when sub2() exits. However not all arrays have a descriptor (it's compiler dependent, but static arrays do not necessarily need one).

When using the x2(:) assumed shape interface there's maybe a solution based on Fortran pointers (quickly tested, it seems to work, but more tests would be needed to be sure):

type(mytype), TARGET :: x(100)

!$OMP PARALLEL DEFAULT(SHARED)
!$OMP MASTER
do i = 1, 100, 10
   call sub1( x(i:i+9) ) ! some stuff that cannot be parallelized
   call sub2( x(i:i+9) ) ! some stuff that can be parallelized once sub1 has completed
!$OMP END MASTER
!$OMP END PARALLEL

[.....]

subroutine sub2(x2)
   type(mytype), intent(inout), TARGET :: x2(:) ! assumed shape interface
   type(mytype), allocatable :: y(:) ! local array
   type(mytype), POINTER :: p(:)
   allocate( y(size(x)) ) 
   p => x2
   do j = 1, 1000
      !$OMP TASK DEFAULT(NONE) SHARED(y), FIRSTPRIVATE(p)
         !...working on p, y being a work array
      !$OMP END TASK
   end do
end subroutine

Fortran pointers are different from C, there's no double, triple,.. pointer. b => a ; c => b is the same as b => a ; c => a (hence, after b => null(), c is still associated to a). Hence, in the above example p is effectively pointing to the original x(i:i+9) section, even after x2 goes out of scope. Then, making it firstprivate ensures that the tasks still have valid private copies even after sub2() has exited. And since p is just a pointer, this a light (shallow) copy (in practice the copy of a descriptor); making directly x2 firstprivate would be possible, but it would be a deep copy (unaffordable if x2 uses a lot of memory).

Note that per the standard, this approach requires the original x array to have the target attribute. AFAIK, when both the actual and the dummy arguments have the target attribute, the compiler is not allowed to make a temporary copy.

Upvotes: 0

Joachim

Reputation: 876

I'm not a Fortran programmer, just sometimes listening to Fortran folks. The issue might be related to the issue of passing array slices to non-blocking MPI communication routines. While the array itself is still there, the array slice meta information runs out of scope as soon as the routine returns. For me, sub2(x(i)) looks like passing an array slice. With the additional task, you now introduced a new scope for the array slice. If you would call sub2(x, i) instead, rearranging sub2 to do the array access, if might solve your scoping issue.

Upvotes: 0

OpenMP tasks crash once the &quot;spawning routine&quot; exits

Answers (2)

Update

Related Questions

OpenMP tasks crash once the "spawning routine" exits