max
max

Reputation: 564

Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp

The fortran 2008 do concurrent construct is a do loop that tells the compiler that no iteration affect any other. It can thus be parallelized safely.

A valid example:

program main
  implicit none
  integer :: i
  integer, dimension(10) :: array
  do concurrent( i= 1: 10)
    array(i) = i
  end do
end program main

where iterations can be done in any order. You can read more about it here.

To my knowledge, gfortran does not automatically parallelize these do concurrent loops, while I remember a gfortran-diffusion-list mail about doing it (here). It justs transform them to classical do loops.

My question: Do you know a way to systematically parallelize do concurrent loops? For instance with a systematic openmp syntax?

Upvotes: 10

Views: 6258

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74365

It is not that easy to do it automatically. The DO CONCURRENT construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:

DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
  <block>
END DO

with:

[BLOCK
    <type-spec> :: <indexes>]

!$omp parallel do
DO <forall-triplet-spec 1>
  DO <forall-triplet-spec 2>
    ...
    [IF (<scalar-mask-expression>) THEN]
      <block>
    [END IF]
    ...
  END DO
END DO
!$omp end parallel do

[END BLOCK]

(things in square brackets are optional, based on the presence of the corresponding parts in the forall-header)

Note that this would not be as effective as parallelising one big loop with <iters 1>*<iters 2>*... independent iterations which is what DO CONCURRENT is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF inside the innermost loop.

If you only have array assignments inside the body of the DO CONCURRENT you would could also transform it into FORALL and use the workshare OpenMP directive. It would be much easier than the above.

DO CONCURRENT <forall-header>
  <block>
END DO

would become:

!$omp parallel workshare
FORALL <forall-header>
  <block>
END FORALL
!$omp end parallel workshare

Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for DO CONCURRENT and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.

Edit: Usage of OpenMP workshare directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL statements and constructs inside OpenMP workshare directives by surrounding them with OpenMP single directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.

Upvotes: 14

Chris
Chris

Reputation: 46306

I'm not sure what you mean "a way to systematically parallelize do concurrent loops". However, to simply parallelise an ordinary do loop with OpenMP you could just use something like:

!$omp parallel private (i)
!$omp do
do i = 1,10
    array(i) = i
end do
!$omp end do
!$omp end parallel

Is this what you are after?

Upvotes: 2

Related Questions