This is a follow up to question 36182486, 41421437 and several others. I want to speed up the assembly of skewness and mass matrices for a FEM calculation by using multiple processors to deal with individual elements in parallel. This little MWE shows the guts of the operation.
!! compile with gfortran -fopenmp -o FEMassembly FEMassembly.f90
Program FEMassembly
use, intrinsic :: iso_c_binding
implicit none
real (c_double) :: arrayM(3,3)=reshape((/2.d0,1.d0,1.d0,1.d0,&
&2.d0,1.d0,1.d0,1.d0,2.d0/),(/3,3/)) ! contrib from one element
integer (c_int) :: ke,ne=4,kx,nx=6,nodes(3)
real (c_double) :: L(6,6)
integer (c_int) :: t(4,3)=reshape((/1,2,5,6,2,3,4,5,4,5,2,3/),(/4,3/))
!! first, no OMP
do ke=1,ne ! for each triangular element
end do
print *,'L no OMP'
!$omp parallel do private (nodes)
do ke=1,ne ! for each triangular element
!! !$omp atomic
!! !$omp end atomic
end do
!$omp end parallel do
print *,'L with OMP and race'
End Program FEMassembly
With the atomic directives commented out, the array L contains several wrong values, presumably because of the race condition I was trying to avoid with the atomic directives. The results are:
L no OMP
2. 1. 0. 1. 0. 0.
1. 6. 1. 2. 2. 0.
0. 1. 4. 0. 2. 1.
1. 2. 0. 4. 1. 0.
0. 2. 2. 1. 6. 1.
0. 0. 1. -0. 1. 2.
L with OMP and race
2. 1. 0. 1. 0. 0.
1. 6. 1. 2. 2. 0.
0. 1. 2. 0. 2. 1.
1. 2. 0. 4. 1. 0.
0. 2. 2. 1. 6. 1.
0. 0. 1. 0. 1. 2.
If the "atomic" directives are uncommented, the compiler return the error: Error: !$OMP ATOMIC statement must set a scalar variable of intrinsic type at (1) where (1) points to arrayM in the line L(nodes,nodes).....
What I am hoping to achieve is have the time consuming contributions from each element (here the trivial arrayM) happen in parallel, but since several threads address the same matrix element, something has to be done to have the sum occur in an orderly fashion. Can anyone suggest a way to do this?
In Fortran the simplest way is to use a reduction. This is because OpenMP for Fortran supports reductions on arrays. Below is what I think you are trying to do, but take it with a pinch of salt because
With such a small array sometimes race conditions are difficult to find
!! compile with gfortran -fopenmp -o FEMassembly FEMassembly.f90
Program FEMassembly
use, intrinsic :: iso_c_binding
Use omp_lib, Only : omp_get_num_threads
implicit none
real (c_double) :: arrayM(3,3)=reshape((/2.d0,1.d0,1.d0,1.d0,&
&2.d0,1.d0,1.d0,1.d0,2.d0/),(/3,3/)) ! contrib from one element
integer (c_int) :: ke,ne=4,nodes(3)
real (c_double) :: L(6,6)
integer (c_int) :: t(4,3)=reshape((/1,2,5,6,2,3,4,5,4,5,2,3/),(/4,3/))
! Not declared in original program
Integer :: nx, kx
! Not set in original program
nx = Size( L, Dim = 1 )
!$omp parallel default( none ) private ( ke, nodes ) shared( ne, t, L, arrayM )
!$omp single
Write( *, * ) 'Working on ', omp_get_num_threads(), ' threads'
!$omp end single
!$omp do reduction( +:L )
do ke=1,ne ! for each triangular element
end do
!$omp end do
!$omp end parallel
End Program FEMassembly
