Kadiya
Kadiya

Reputation: 83

Do loop to be parallelized in a Matlab's Mex function (Fortran) with OpenMP

I have a do loop in a Matlab's Mex function (written in Fortran) that performs some calculations for each element of a FEM mesh. My mesh consists of 250k elements, so I thought it was worth parallelizing it. This is my first attempt at parallelizing this code with OpenMP (I'm a beginner at coding). I used the reduction command to avoid the race condition in fintk(dofele) = fintk(dofele) + fintele. Is it correct? I can compile it in Matlab without any problem. However, when I use it (in Matlab), it produces correct results for a 12k element mesh and it is faster than the serialized one, but when I try to use it for the 250k element mesh Matlab crashes. Thank you for helping me

  subroutine loop_over_elements( &
  ! OUT  
  fintk,Sxyz,&
  ! IN                    
  Elem,Bemesh,Dofelemat,u,dt,NE,NDOF)

  use omp_lib
  
  implicit none
  
  mwSize NE, NDOF, ele
  integer,  parameter   :: dp = selected_real_kind(15,307)
  real(dp) :: fintk(NDOF), Sxyz(6,NE), Elemat(4,NE), Bemesh(6,12,NE), Dofelemat(12,NE)
  real(dp) :: u(NDOF)  
  
  real(dp) :: Bele(6,12), fintele(12), uele(12), si(6), dt
  integer*4  :: nodes(4), dofele(12) 
  
  fintk = 0.D0
  
  !$OMP PARALLEL DO REDUCTION(+:fintk(:)) PRIVATE(ele,nodes,Bele,dofele,uele,si,fintele) 
      DO ele = 1, NE 
        nodes   = Elemat(1:4,ele)
        Bele    = Bemesh(1:6,1:12,ele)
        dofele  = Dofelemat(1:12,ele)
        uele    = u(dofele)
        
        call comput_subroutine( &
!       IN
        Bele,uele,dt, &
!       OUT                                        
        si) 
        
        Sxyz(:,ele)   = si
        fintele       = MATMUL(TRANSPOSE(Bele),si)
        fintk(dofele) = fintk(dofele) + fintele
        
      END DO

  !$OMP END PARALLEL DO

  return
  end

Upvotes: 2

Views: 326

Answers (1)

Kadiya
Kadiya

Reputation: 83

I solved the "crashing" of Matlab that I was experiencing by adding this line in the general mexFunction subroutine before calling the loop_over_elements subroutine: call KMP_SET_STACKSIZE(100000000). I figured that, since Matlab didn't crash when I was using the large model with the non-parallelized subroutine, maybe it was a memory problem. After that I discovered the well-known (not to me unfortunately) segmentation fault problem when using OpenMp with large arrays (see for example this). I'm still a bit confused about the difference between setting OMP_STACKSIZE(which I don't know how to do in Mex functions) and KMP_SET_STACKSIZE, but now the parallelized code works with the large model.

Upvotes: 2

Related Questions