Sathya Anantharajah
Sathya Anantharajah

Reputation: 77

MPI FORTRAN error (generating rand num in a array and simple calculation)

Here is the basic info about the code I wrote here. I recently started to learn and write MPI. The purpose of the this code to figure out the computation time for given number of total processes. I am going to post the full code. it's 97 lines long, but I will point out it where I think the problem is..

Basic idea
(1) User will input a constant(K) and a integer(M),then BCAST both vaules.
(2) Three 1D arrays(A,B,C) will be allocated with M blocks.
(3) A subroutine(init_random_seed) will fill arrays A and B with M random numbers and BCAST it.
(4) Array C was filled with zeros and send to process==1, and at process==1 a simple math calculation will be done among arrays A and B.
(5) Results from each iteration will be stored in array C and send to process==2 by using MPI_SEND.
(6) Finally at process==2, it will write the results of C in a text file.

so here is the code,

   MODULE MPI    !!! I usually initialize all the variables here
     INCLUDE   'mpif.h' 
     REAL      :: U,V,K
     REAL      :: START,FINISH
     INTEGER   :: O,M,FILE
     INTEGER   :: MYID,TOTPS, IERR
     REAL,ALLOCATABLE,DIMENSION(:)   :: A,B,C
   END MODULE MPI


   PROGRAM CRAFT !!! main program 
     USE   MPI  
     CALL  MPIINIT  
     CALL  CPU_TIME(START)
     CALL  TEST
     CALL  CPU_TIME(FINISH) 
     PRINT*, " TOTAL PROCESSING TIME = " , FINISH - START , "SECONDS AT PROCESS", MYID
     CALL  MPI_FINALIZE(IERR)  
     STOP
   END PROGRAM CRAFT

   SUBROUTINE MPIINIT   
     USE  MPI
     CALL MPI_INIT( IERR ) 
     CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
     CALL MPI_COMM_SIZE(MPI_COMM_WORLD,TOTPS,IERR)
     RETURN
   END SUBROUTINE MPIINIT


   SUBROUTINE TEST
   USE MPI
   CALL INITIAL
   CALL WORK 
   CALL COLLECT
   END SUBROUTINE TEST

   SUBROUTINE INITIAL   !!! random number input and BCAST
   USE MPI 
   CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) 
   !! I'm not sure if a barrier is necessary or not here.
   IF (MYID .EQ. 0) THEN
   PRINT*, "ENTER A CONSTANT"
   READ*,  K
   PRINT*, "HOW MANY TERMS?"
   READ*, M
   END IF 
   ALLOCATE(A(M),B(M),C(M))
   CALL INIT_RANDOM_SEED() !!! see the very last subroutine 
   DO O =1,M
       CALL RANDOM_NUMBER(U)
       CALL RANDOM_NUMBER(V)
       A(O) = U*10
       B(O) = V*10
       C(O) = 0.0
   END DO
   CALL MPI_BCAST(K,1,MPI_REAL,0,MPI_COMM_WORLD,IERR)
   CALL MPI_BCAST(A,M,MPI_REAL,0,MPI_COMM_WORLD,IERR) 
   CALL MPI_BCAST(B,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
   CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
   CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR)  !! tag value is 0
   END SUBROUTINE INITIAL

   SUBROUTINE WORK   !!! simple math calculations 
   USE MPI
   IF(MYID .EQ. 1) THEN
   CALL MPI_RECV(C,M,MPI_REAL,0,0,MPI_COMM_WORLD,MPISTTS,IERR)
   DO O = 1,M
           C(O) = (1/K)*( A(O)**K - K*B(0))
   END DO
   CALL MPI_SEND(C,M,MPI_REAL,2,1,MPI_COMM_WORLD,IERR) !! tag value is 1
   END IF
   END SUBROUTINE WORK

   SUBROUTINE COLLECT !! writing txt files
   USE MPI 
   IF (MYID .EQ. 2) THEN 
   CALL MPI_RECV(C,M,MPI_REAL,1,1,MPI_COMM_WORLD,MPISTTS,IERR)
   OPEN(UNIT=11,FILE="ARRAY.TXT",ACTION="WRITE") 
   DO O =1,M   
   WRITE(11,'(I2,2X,F4.1,2X,F4.1,2X,F4.1)') O, A(0),B(O),C(O)
   END DO
   CLOSE(11)
   END IF 
   END SUBROUTINE COLLECT

   SUBROUTINE INIT_RANDOM_SEED() !! I found this subroutine on online 
       IMPLICIT NONE 
       INTEGER :: I,N,CLOCK
       INTEGER, DIMENSION(:), ALLOCATABLE :: SEED
       CALL RANDOM_SEED(SIZE=N)
       ALLOCATE(SEED(N))
       CALL SYSTEM_CLOCK(COUNT=CLOCK)
       SEED = CLOCK + 37 * (/ (I - 1, I = 1, N) /)
       CALL RANDOM_SEED(PUT = SEED)
       DEALLOCATE(SEED)
    END SUBROUTINE INIT_RANDOM_SEED

The reason why I'm here
* the program complies but I think I am getting a runtime error. Here is the error,

      ENTER A CONSTANT
       2
        HOW MANY TERMS?
       3
       [sflogin0:11103] *** An error occurred in MPI_Bcast
       [sflogin0:11103] *** on communicator MPI_COMM_WORLD
       [sflogin0:11103] *** MPI_ERR_TRUNCATE: message truncated
       [sflogin0:11103] *** MPI_ERRORS_ARE_FATAL (your MPI job will now                                abort)
         TOTAL PROCESSING TIME =  2.9265954 SECONDS AT PROCESS 0
       --------------------------------------------------------------------------
       mpirun has exited due to process rank 1 with PID 11103 on
       node sflogin0 exiting without calling "finalize". This may
       have caused other processes in the application to be
       terminated by signals sent by mpirun (as reported here).
       --------------------------------------------------------------------------
       [sflogin0:11099] 2 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
       [sflogin0:11099] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Note: I only gave five processes when i was running it mpirun -np 5 ./a.out
Please take a look at it and help me out. Thanks

Upvotes: 3

Views: 623

Answers (2)

innoSPG
innoSPG

Reputation: 4656

In the SUBROUTINE INITIAL you have to:

  1. Broadcast M before doing the allocation; apart from process 0, no other process is really allocating. With default integer to zero, they are allocating zero size vector, and this will be a problem in broadcast.

    CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
    
    ALLOCATE(A(M),B(M),C(M))
    

The next bullets are not really part of the current problem, but the next problems you will face.

  1. If only one process is getting the random numbers as you described and implement, that part should be done in an if, to make sure that only that process is doing that part.

    IF (MYID == 0) THEN
        !generate the numbers
    END IF
    
  2. I strongly recommend you group the MPI_SEND and MPI_RECV next to each other. In that respect you should move the MPI_SEND from SUBROUTINE INITIAL to SUBROUTINE WORK and also move the other one from SUBROUTINE WORK to SUBROUTINE COLLECT. This will save you time in debugging. Do not forget to also put the first MPI_SEND in an IF statement

    IF (MYID .EQ. 0) THEN
        CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR)  !! tag value is 0
    END IF 
    
  3. I also recommend using files for all input instead of standard input when doing parallel.

Finally, just to give you a hint on code organization, the full code might look like (see bellow). O for variable name is a very bad idea, I changed it to I and set it as local variable. Now, you want to make sure that the index 0 was really for zero and not a typo where you wanted to put O. Also, I added IMPLICIT NONE to force the declaration of all variable. I took the subroutine that is all MPI into the module and all other in the CONTAINS section of the PROGRAM.

    MODULE MPI    !!! I usually initialize all the variables here
        IMPLICIT NONE

        INCLUDE   'mpif.h' 
        REAL      :: U,V,K
        REAL      :: START,FINISH
        INTEGER   :: O,M,FILE
        INTEGER   :: MYID,TOTPS, IERR, MPISTTS
        REAL,ALLOCATABLE,DIMENSION(:)   :: A,B,C

    CONTAINS

        SUBROUTINE MPIINIT
            IMPLICIT NONE

            CALL MPI_INIT( IERR ) 
            CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
            CALL MPI_COMM_SIZE(MPI_COMM_WORLD,TOTPS,IERR)
            RETURN
        END SUBROUTINE MPIINIT
    END MODULE MPI


    PROGRAM CRAFT !!! main program 
        USE   MPI  
        IMPLICIT NONE

        CALL  MPIINIT  
        CALL  CPU_TIME(START)
        CALL  TEST
        CALL  CPU_TIME(FINISH) 
        PRINT*, " TOTAL PROCESSING TIME = " , FINISH - START , "SECONDS AT PROCESS", MYID
        CALL  MPI_FINALIZE(IERR)  
        STOP

    CONTAINS

        SUBROUTINE TEST
            IMPLICIT NONE

            CALL INITIAL
            CALL WORK 
            CALL COLLECT

        END SUBROUTINE TEST

        SUBROUTINE INITIAL   !!! random number input and BCAST
            IMPLICIT NONE

            CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) 
            !! I'm not sure if a barrier is necessary or not here.
            IF (MYID .EQ. 0) THEN
                PRINT*, "ENTER A CONSTANT"
                READ*,  K
                PRINT*, "HOW MANY TERMS?"
                READ*, M
            END IF 

            CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)

            ALLOCATE(A(M),B(M),C(M))

            IF (MYID .EQ. 0) THEN
                CALL INIT_RANDOM_SEED() !!! see the very last subroutine 
                DO O =1,M
                    CALL RANDOM_NUMBER(U)
                    CALL RANDOM_NUMBER(V)
                    A(O) = U*10
                    B(O) = V*10
                    C(O) = 0.0
                END DO
            END IF 
            CALL MPI_BCAST(K,1,MPI_REAL,0,MPI_COMM_WORLD,IERR)
            CALL MPI_BCAST(A,M,MPI_REAL,0,MPI_COMM_WORLD,IERR) 
            CALL MPI_BCAST(B,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
            IF (MYID .EQ. 0) THEN
                CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR)  !! tag value is 0
            END IF 
        END SUBROUTINE INITIAL

        SUBROUTINE WORK   !!! simple math calculations 
            IMPLICIT NONE

            INTEGER I

            IF(MYID .EQ. 1) THEN
                CALL MPI_RECV(C,M,MPI_REAL,0,0,MPI_COMM_WORLD,MPISTTS,IERR)
                DO I = 1,M
                    C(I) = (1/K)*( A(I)**K - K*B(0))
                END DO
            END IF
        END SUBROUTINE WORK

        SUBROUTINE COLLECT !! writing txt files
            IMPLICIT NONE

            INTEGER I

            IF(MYID .EQ. 1) THEN
                CALL MPI_SEND(C,M,MPI_REAL,2,1,MPI_COMM_WORLD,IERR) !! tag value is 1
            ELSE IF (MYID .EQ. 2) THEN 
                CALL MPI_RECV(C,M,MPI_REAL,1,1,MPI_COMM_WORLD,MPISTTS,IERR)
                OPEN(UNIT=11,FILE="ARRAY.TXT",ACTION="WRITE") 
                DO I =1,M   
                    WRITE(11,'(I2,2X,F4.1,2X,F4.1,2X,F4.1)') I, A(0),B(I),C(I)
                END DO
                CLOSE(11)
            END IF 
        END SUBROUTINE COLLECT

        SUBROUTINE INIT_RANDOM_SEED() !! I found this subroutine on online 
            IMPLICIT NONE 
            INTEGER :: I,N,CLOCK
            INTEGER, DIMENSION(:), ALLOCATABLE :: SEED

            CALL RANDOM_SEED(SIZE=N)
            ALLOCATE(SEED(N))
            CALL SYSTEM_CLOCK(COUNT=CLOCK)
            SEED = CLOCK + 37 * (/ (I - 1, I = 1, N) /)
            CALL RANDOM_SEED(PUT = SEED)
            DEALLOCATE(SEED)
        END SUBROUTINE INIT_RANDOM_SEED

    END PROGRAM CRAFT

Upvotes: 5

casey
casey

Reputation: 6915

Look at this code:

SUBROUTINE INITIAL   !!! random number input and BCAST
   USE MPI 
   CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) 
   !! I'm not sure if a barrier is necessary or not here.
   IF (MYID .EQ. 0) THEN
   PRINT*, "ENTER A CONSTANT"
   READ*,  K
   PRINT*, "HOW MANY TERMS?"
   READ*, M
   END IF 

MPI rank 0 knows K and M and ranks 1-4 (you specified -np 5) have an uninitialized K and M.

   ALLOCATE(A(M),B(M),C(M))
   CALL INIT_RANDOM_SEED() !!! see the very last subroutine 
   DO O =1,M
       CALL RANDOM_NUMBER(U)
       CALL RANDOM_NUMBER(V)
       A(O) = U*10
       B(O) = V*10
       C(O) = 0.0

Now, all 5 ranks allocate A, B and C of size M, but MPI ranks 1-4 have an uninitialized M and what happens after this departs from both the Fortran and MPI standards and anything goes. You need to broadcast M to all of your ranks before using them. Correcting this will let your program run as you intended it, but...

You are writing new Fortran, and as such, it is best to form better habits about writing your code.

  • use implicit none. If you keep writing code this way, you'll eventually understand why everyone says this.
  • name your module something else beside MPI. While what you have works, it is confusing because it at first it looks like you are using the Fortran90 MPI interface module mpi, rather than your own module which includes the FORTRAN77 interface include mpif.h.
  • After renaming your module, start using the Fortran90 use mpi or the Fortran2008 use mpi_f08 modules. You'll get (slightly) more robust protection from miscalling MPI procedures this way. It isn't perfect, but it is better.
  • O is an odd choice of an iterator, and in at least one place in your code you've written that O as a 0 (bug!).
  • I echo the other answer to use file IO (e.g. a namelist) to load your input, as this can be loaded by all ranks and avoid broadcasting M and K and you'll need do this at some point if you are going to be submitting your MPI jobs non-interactively (which is the normal case in an HPC setting).
  • Modern Fortran understands lowercase letters.
  • I understand this is a work in progress and you are learning, but I find this to be quite an odd MPI program. Right now you parallelize your initialization but then broadcast rank 0's version to the other ranks, overwriting what they did. Your work is done serially, and only in rank 1 and your output is done serially in rank 2.

Upvotes: 3

Related Questions