Reputation: 83
I have a do
loop in a Matlab's Mex function (written in Fortran) that performs some calculations for each element of a FEM mesh. My mesh consists of 250k elements, so I thought it was worth parallelizing it. This is my first attempt at parallelizing this code with OpenMP (I'm a beginner at coding). I used the reduction
command to avoid the race condition in fintk(dofele) = fintk(dofele) + fintele
. Is it correct? I can compile it in Matlab without any problem. However, when I use it (in Matlab), it produces correct results for a 12k element mesh and it is faster than the serialized one, but when I try to use it for the 250k element mesh Matlab crashes. Thank you for helping me
subroutine loop_over_elements( &
! OUT
fintk,Sxyz,&
! IN
Elem,Bemesh,Dofelemat,u,dt,NE,NDOF)
use omp_lib
implicit none
mwSize NE, NDOF, ele
integer, parameter :: dp = selected_real_kind(15,307)
real(dp) :: fintk(NDOF), Sxyz(6,NE), Elemat(4,NE), Bemesh(6,12,NE), Dofelemat(12,NE)
real(dp) :: u(NDOF)
real(dp) :: Bele(6,12), fintele(12), uele(12), si(6), dt
integer*4 :: nodes(4), dofele(12)
fintk = 0.D0
!$OMP PARALLEL DO REDUCTION(+:fintk(:)) PRIVATE(ele,nodes,Bele,dofele,uele,si,fintele)
DO ele = 1, NE
nodes = Elemat(1:4,ele)
Bele = Bemesh(1:6,1:12,ele)
dofele = Dofelemat(1:12,ele)
uele = u(dofele)
call comput_subroutine( &
! IN
Bele,uele,dt, &
! OUT
si)
Sxyz(:,ele) = si
fintele = MATMUL(TRANSPOSE(Bele),si)
fintk(dofele) = fintk(dofele) + fintele
END DO
!$OMP END PARALLEL DO
return
end
Upvotes: 2
Views: 326
Reputation: 83
I solved the "crashing" of Matlab that I was experiencing by adding this line in the general mexFunction
subroutine before calling the loop_over_elements
subroutine:
call KMP_SET_STACKSIZE(100000000)
. I figured that, since Matlab didn't crash when I was using the large model with the non-parallelized subroutine, maybe it was a memory problem. After that I discovered the well-known (not to me unfortunately) segmentation fault problem when using OpenMp with large arrays (see for example this). I'm still a bit confused about the difference between setting OMP_STACKSIZE
(which I don't know how to do in Mex functions) and KMP_SET_STACKSIZE
, but now the parallelized code works with the large model.
Upvotes: 2