Reputation: 11

Segmentation fault using SCALAPACK in Fortran? No backtrace?

I'm trying to find the eigenvalues and eigenvectors of a Hermitian matrix using SCALAPACK and MPI in Fortran. For bug-squashing, I made this program as simple as possible, but am still getting a segmentation fault. Per the answers given to people with similar questions, I've tried changing all of my integers to integer*8, and all of my reals to real*8 or real*16, but I still get this issue. Most interestingly, I don't even get a backtrace for the segmentation fault: the program hangs up when trying to give me a backtrace and has to be aborted manually.

Also, please forgive my lack of knowledge -- I'm not familiar with most program-y things but I've done my best. Here is my code:

    PROGRAM easydiag
    IMPLICIT NONE 
  INCLUDE 'mpif.h'
  EXTERNAL BLACS_EXIT, BLACS_GET, BLACS_GRIDEXIT, BLACS_GRIDINFO
  EXTERNAL BLACS_GRIDINIT, BLACS_PINFO,BLACS_SETUP, DESCINIT 
  INTEGER,EXTERNAL::NUMROC,ICEIL
  REAL*8,EXTERNAL::PDLAMCH

  INTEGER,PARAMETER::XNDIM=4 ! MATRIX WILL BE XNDIM BY XNDIM
  INTEGER,PARAMETER::EXPND=XNDIM
  INTEGER,PARAMETER::NPROCS=1

  INTEGER COMM,MYID,ROOT,NUMPROCS,IERR,STATUS(MPI_STATUS_SIZE)
  INTEGER NUM_DIM
  INTEGER NPROW,NPCOL
  INTEGER CONTEXT, MYROW, MYCOL

  COMPLEX*16,ALLOCATABLE::HH(:,:),ZZ(:,:),MATTODIAG(:,:)
  REAL*8:: EIG(2*XNDIM) ! EIGENVALUES
  CALL MPI_INIT(ierr)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
  ROOT=0

  NPROW=INT(SQRT(REAL(NPROCS)))
  NPCOL=NPROCS/NPROW   
  NUM_DIM=2*EXPND/NPROW

  CALL SL_init(CONTEXT,NPROW,NPCOL)
  CALL BLACS_GRIDINFO( CONTEXT, NPROW, NPCOL, MYROW, MYCOL )

  ALLOCATE(MATTODIAG(XNDIM,XNDIM),HH(NUM_DIM,NUM_DIM),ZZ(NUM_DIM,NUM_DIM))
  MATTODIAG=0.D0

  CALL MAKEHERMMAT(XNDIM,MATTODIAG)

  CALL MPIDIAGH(EXPND,MATTODIAG,ZZ,MYROW,MYCOL,NPROW,NPCOL,NUM_DIM,CONTEXT,EIG)


  DEALLOCATE(MATTODIAG,HH,ZZ)




   CALL MPI_FINALIZE(IERR)


END

!****************************************************
SUBROUTINE MAKEHERMMAT(XNDIM,MATTODIAG)
  IMPLICIT NONE
  INTEGER:: XNDIM, I, J, COUNTER
  COMPLEX*16:: MATTODIAG(XNDIM,XNDIM)
  REAL*8:: RAND

  COUNTER = 1
  DO J=1,XNDIM
    DO I=J,XNDIM
        MATTODIAG(I,J)=COUNTER
        COUNTER=COUNTER+1
    END DO
  END DO




END
!****************************************************
SUBROUTINE MPIDIAGH(EXPND,A,Z,MYROW,MYCOL,NPROW,NPCOL,NUM_DIM,CONTEXT,W)
    IMPLICIT NONE
  EXTERNAL DESCINIT 
  REAL*8,EXTERNAL::PDLAMCH

  INTEGER EXPND,NUM_DIM
  INTEGER CONTEXT
  INTEGER MYCOL,MYROW,NPROW,NPCOL
  COMPLEX*16 A(NUM_DIM,NUM_DIM), Z(NUM_DIM,NUM_DIM)
  REAL*8 W(2*EXPND)

  INTEGER N
  CHARACTER JOBZ, RANGE, UPLO
  INTEGER IL,IU,IA,JA,IZ,JZ
  INTEGER LIWORK,LRWORK,LWORK
  INTEGER M, NZ, INFO

  REAL*8  ABSTOL, ORFAC, VL, VU

  INTEGER DESCA(50), DESCZ(50)
  INTEGER IFAIL(2*EXPND), ICLUSTR(2*NPROW*NPCOL)
  REAL*8 GAP(NPROW*NPCOL)
  INTEGER,ALLOCATABLE:: IWORK(:)
  REAL*8,ALLOCATABLE :: RWORK(:)
  COMPLEX*16,ALLOCATABLE::WORK(:)

  N=2*EXPND
  JOBZ='V'
  RANGE='I'
  UPLO='U' ! This should be U rather than L
  VL=0.d0
  VU=0.d0
  IL=1  ! EXPND/2+1
  IU=2*EXPND  !  EXPND+(EXPND/2)   ! HERE IS FOR THE CUTTING OFF OF THE STATE
  M=IU-IL+1
  ORFAC=-1.D0
  IA=1
  JA=1
  IZ=1
  JZ=1


  ABSTOL=PDLAMCH( CONTEXT, 'U')
  CALL DESCINIT( DESCA, N, N, NUM_DIM, NUM_DIM, 0, 0, CONTEXT, NUM_DIM, INFO )
  CALL DESCINIT( DESCZ, N, N, NUM_DIM, NUM_DIM, 0, 0, CONTEXT, NUM_DIM, INFO )



  LWORK = -1
  LRWORK = -1
  LIWORK = -1
  ALLOCATE(WORK(LWORK))
  ALLOCATE(RWORK(LRWORK))
  ALLOCATE(IWORK(LIWORK))


  CALL PZHEEVX( JOBZ, RANGE, UPLO, N, A, IA, JA, DESCA, VL, &
                VU, IL, IU, ABSTOL, M, NZ, W, ORFAC, Z, IZ, &
                JZ, DESCZ, WORK, LWORK, RWORK, LRWORK, IWORK, &
                LIWORK, IFAIL, ICLUSTR, GAP, INFO )

  LWORK = INT(ABS(WORK(1)))
  LRWORK = INT(ABS(RWORK(1)))
  LIWORK =INT (ABS(IWORK(1)))

  DEALLOCATE(WORK)
  DEALLOCATE(RWORK)
  DEALLOCATE(IWORK)

  ALLOCATE(WORK(LWORK))
  ALLOCATE(RWORK(LRWORK))
  ALLOCATE(IWORK(LIWORK))


         PRINT*, LWORK, LRWORK, LIWORK

  CALL PZHEEVX( JOBZ, RANGE, UPLO, N, A, IA, JA, DESCA, VL, &
                VU, IL, IU, ABSTOL, M, NZ, W, ORFAC, Z, IZ, &
                JZ, DESCZ, WORK, LWORK, RWORK, LRWORK, IWORK, &
                LIWORK, IFAIL, ICLUSTR, GAP, INFO )





  RETURN
END

The problem is with the second PZHEEVX function. I'm fairly certain that I'm using it correctly since this code is a simpler version of another more complicated code that works fine. For this purpose, I'm only using one processor.

Help!

Upvotes: 1

Answers (2)

Andras Deak -- Слава Україні

Reputation: 35146

There's a fishy piece of dimensioning in your code which can easily be responsible for the segfault. In your main program you set

EXPND=XNDIM=4
NUM_DIM=2*EXPND !NPROW==1 for a single-process test
ALLOCATE(MATTODIAG(XNDIM,XNDIM))   ! MATTODIAG(4,4)

Then you pass your MATTODIAG, the Hermitian matrix, to

CALL MPIDIAGH(EXPND,MATTODIAG,ZZ,MYROW,...)

which is in turn defined as

SUBROUTINE MPIDIAGH(EXPND,A,Z,MYROW,...)

COMPLEX*16 A(NUM_DIM,NUM_DIM)   ! A(8,8)

This is already an inconsistency, which can mess up the computations in that subroutine (even without having a segfault). Furthermore, the subroutine along with scalapack thinks that A is of size (8,8), instead of (4,4) which you allocated in the main program, allowing the subroutine to overrun available memory.

Upvotes: 1

roygvib

Reputation: 7395

According to this page setting LWORK = -1 seems to request the PZHEEVX routine to return the necessary size of all the work arrays, for example,

If LWORK = -1, then LWORK is global input and a workspace query is assumed; the routine only calculates the optimal size for all work arrays. Each of these values is returned in the first entry of the corresponding work array, and no error message is issued by PXERBLA.

Similar explanations can be found for LRWORK = -1. As for IWORK,

IWORK (local workspace) INTEGER array On return, IWORK(1) contains the amount of integer workspace required.

but in your program the work arrays are allocated as

LWORK = -1
LRWORK = -1
LIWORK = -1
ALLOCATE(WORK(LWORK))
ALLOCATE(RWORK(LRWORK))
ALLOCATE(IWORK(LIWORK))

and after the first call of PZHEEVX, the sizes of the work arrays are obtained as

LWORK = INT(ABS(WORK(1)))
LRWORK = INT(ABS(RWORK(1)))
LIWORK =INT (ABS(IWORK(1)))

which looks inconsistent (-1 vs 1). So it will be better to modify the allocation as (*)

allocate( WORK(1), RWORK(1), IWORK(1) )

An example in this page also seems to allocate the work arrays this way. Another point of concern is that INT() is used in several places (for example, NPROW=INT(SQRT(REAL(NPROCS))), but I guess it might be better to use NINT() to avoid the effect of round-off errors.

(*) More precisely, allocation of an array with -1 is not valid because the size of an allocated array becomes 0 (thanks to @francescalus). You can verify this by printing size(a) or a(:). To prevent this kind of error, it is very useful to attach compiler options like -fcheck=all (for gfortran) or -check (for ifort).

Upvotes: 2

Segmentation fault using SCALAPACK in Fortran? No backtrace?

Answers (2)

Related Questions