Reputation: 77
Here is the basic info about the code I wrote here. I recently started to learn and write MPI. The purpose of the this code to figure out the computation time for given number of total processes. I am going to post the full code. it's 97 lines long, but I will point out it where I think the problem is..
Basic idea
(1) User will input a constant(K) and a integer(M),then BCAST both vaules.
(2) Three 1D arrays(A,B,C) will be allocated with M blocks.
(3) A subroutine(init_random_seed) will fill arrays A and B with M random numbers and BCAST it.
(4) Array C was filled with zeros and send to process==1, and at process==1 a simple math calculation will be done among arrays A and B.
(5) Results from each iteration will be stored in array C and send to process==2 by using MPI_SEND.
(6) Finally at process==2, it will write the results of C in a text file.
so here is the code,
MODULE MPI !!! I usually initialize all the variables here
INCLUDE 'mpif.h'
REAL :: U,V,K
REAL :: START,FINISH
INTEGER :: O,M,FILE
INTEGER :: MYID,TOTPS, IERR
REAL,ALLOCATABLE,DIMENSION(:) :: A,B,C
END MODULE MPI
PROGRAM CRAFT !!! main program
USE MPI
CALL MPIINIT
CALL CPU_TIME(START)
CALL TEST
CALL CPU_TIME(FINISH)
PRINT*, " TOTAL PROCESSING TIME = " , FINISH - START , "SECONDS AT PROCESS", MYID
CALL MPI_FINALIZE(IERR)
STOP
END PROGRAM CRAFT
SUBROUTINE MPIINIT
USE MPI
CALL MPI_INIT( IERR )
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,TOTPS,IERR)
RETURN
END SUBROUTINE MPIINIT
SUBROUTINE TEST
USE MPI
CALL INITIAL
CALL WORK
CALL COLLECT
END SUBROUTINE TEST
SUBROUTINE INITIAL !!! random number input and BCAST
USE MPI
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
!! I'm not sure if a barrier is necessary or not here.
IF (MYID .EQ. 0) THEN
PRINT*, "ENTER A CONSTANT"
READ*, K
PRINT*, "HOW MANY TERMS?"
READ*, M
END IF
ALLOCATE(A(M),B(M),C(M))
CALL INIT_RANDOM_SEED() !!! see the very last subroutine
DO O =1,M
CALL RANDOM_NUMBER(U)
CALL RANDOM_NUMBER(V)
A(O) = U*10
B(O) = V*10
C(O) = 0.0
END DO
CALL MPI_BCAST(K,1,MPI_REAL,0,MPI_COMM_WORLD,IERR)
CALL MPI_BCAST(A,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
CALL MPI_BCAST(B,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR) !! tag value is 0
END SUBROUTINE INITIAL
SUBROUTINE WORK !!! simple math calculations
USE MPI
IF(MYID .EQ. 1) THEN
CALL MPI_RECV(C,M,MPI_REAL,0,0,MPI_COMM_WORLD,MPISTTS,IERR)
DO O = 1,M
C(O) = (1/K)*( A(O)**K - K*B(0))
END DO
CALL MPI_SEND(C,M,MPI_REAL,2,1,MPI_COMM_WORLD,IERR) !! tag value is 1
END IF
END SUBROUTINE WORK
SUBROUTINE COLLECT !! writing txt files
USE MPI
IF (MYID .EQ. 2) THEN
CALL MPI_RECV(C,M,MPI_REAL,1,1,MPI_COMM_WORLD,MPISTTS,IERR)
OPEN(UNIT=11,FILE="ARRAY.TXT",ACTION="WRITE")
DO O =1,M
WRITE(11,'(I2,2X,F4.1,2X,F4.1,2X,F4.1)') O, A(0),B(O),C(O)
END DO
CLOSE(11)
END IF
END SUBROUTINE COLLECT
SUBROUTINE INIT_RANDOM_SEED() !! I found this subroutine on online
IMPLICIT NONE
INTEGER :: I,N,CLOCK
INTEGER, DIMENSION(:), ALLOCATABLE :: SEED
CALL RANDOM_SEED(SIZE=N)
ALLOCATE(SEED(N))
CALL SYSTEM_CLOCK(COUNT=CLOCK)
SEED = CLOCK + 37 * (/ (I - 1, I = 1, N) /)
CALL RANDOM_SEED(PUT = SEED)
DEALLOCATE(SEED)
END SUBROUTINE INIT_RANDOM_SEED
The reason why I'm here
* the program complies but I think I am getting a runtime error. Here is the error,
ENTER A CONSTANT
2
HOW MANY TERMS?
3
[sflogin0:11103] *** An error occurred in MPI_Bcast
[sflogin0:11103] *** on communicator MPI_COMM_WORLD
[sflogin0:11103] *** MPI_ERR_TRUNCATE: message truncated
[sflogin0:11103] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
TOTAL PROCESSING TIME = 2.9265954 SECONDS AT PROCESS 0
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 11103 on
node sflogin0 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[sflogin0:11099] 2 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[sflogin0:11099] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Note: I only gave five processes when i was running it mpirun -np 5 ./a.out
Please take a look at it and help me out. Thanks
Upvotes: 3
Views: 623
Reputation: 4656
In the SUBROUTINE INITIAL
you have to:
Broadcast M
before doing the allocation; apart from process 0, no other process is really allocating. With default integer to zero, they are allocating zero size vector, and this will be a problem in broadcast.
CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
ALLOCATE(A(M),B(M),C(M))
The next bullets are not really part of the current problem, but the next problems you will face.
If only one process is getting the random numbers as you described and implement, that part should be done in an if, to make sure that only that process is doing that part.
IF (MYID == 0) THEN
!generate the numbers
END IF
I strongly recommend you group the MPI_SEND
and MPI_RECV
next to each other. In that respect you should move the MPI_SEND
from SUBROUTINE INITIAL
to SUBROUTINE WORK
and also move the other one from SUBROUTINE WORK
to SUBROUTINE COLLECT
. This will save you time in debugging. Do not forget to also put the first MPI_SEND
in an IF
statement
IF (MYID .EQ. 0) THEN
CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR) !! tag value is 0
END IF
I also recommend using files for all input instead of standard input when doing parallel.
Finally, just to give you a hint on code organization, the full code might look like (see bellow).
O
for variable name is a very bad idea, I changed it to I and set it as local variable. Now, you want to make sure that the index 0
was really for zero and not a typo where you wanted to put O
. Also, I added IMPLICIT NONE
to force the declaration of all variable. I took the subroutine that is all MPI into the module and all other in the CONTAINS
section of the PROGRAM
.
MODULE MPI !!! I usually initialize all the variables here
IMPLICIT NONE
INCLUDE 'mpif.h'
REAL :: U,V,K
REAL :: START,FINISH
INTEGER :: O,M,FILE
INTEGER :: MYID,TOTPS, IERR, MPISTTS
REAL,ALLOCATABLE,DIMENSION(:) :: A,B,C
CONTAINS
SUBROUTINE MPIINIT
IMPLICIT NONE
CALL MPI_INIT( IERR )
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,TOTPS,IERR)
RETURN
END SUBROUTINE MPIINIT
END MODULE MPI
PROGRAM CRAFT !!! main program
USE MPI
IMPLICIT NONE
CALL MPIINIT
CALL CPU_TIME(START)
CALL TEST
CALL CPU_TIME(FINISH)
PRINT*, " TOTAL PROCESSING TIME = " , FINISH - START , "SECONDS AT PROCESS", MYID
CALL MPI_FINALIZE(IERR)
STOP
CONTAINS
SUBROUTINE TEST
IMPLICIT NONE
CALL INITIAL
CALL WORK
CALL COLLECT
END SUBROUTINE TEST
SUBROUTINE INITIAL !!! random number input and BCAST
IMPLICIT NONE
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
!! I'm not sure if a barrier is necessary or not here.
IF (MYID .EQ. 0) THEN
PRINT*, "ENTER A CONSTANT"
READ*, K
PRINT*, "HOW MANY TERMS?"
READ*, M
END IF
CALL MPI_BCAST(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
ALLOCATE(A(M),B(M),C(M))
IF (MYID .EQ. 0) THEN
CALL INIT_RANDOM_SEED() !!! see the very last subroutine
DO O =1,M
CALL RANDOM_NUMBER(U)
CALL RANDOM_NUMBER(V)
A(O) = U*10
B(O) = V*10
C(O) = 0.0
END DO
END IF
CALL MPI_BCAST(K,1,MPI_REAL,0,MPI_COMM_WORLD,IERR)
CALL MPI_BCAST(A,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
CALL MPI_BCAST(B,M,MPI_REAL,0,MPI_COMM_WORLD,IERR)
IF (MYID .EQ. 0) THEN
CALL MPI_SEND(C,M,MPI_REAL,1,0,MPI_COMM_WORLD,IERR) !! tag value is 0
END IF
END SUBROUTINE INITIAL
SUBROUTINE WORK !!! simple math calculations
IMPLICIT NONE
INTEGER I
IF(MYID .EQ. 1) THEN
CALL MPI_RECV(C,M,MPI_REAL,0,0,MPI_COMM_WORLD,MPISTTS,IERR)
DO I = 1,M
C(I) = (1/K)*( A(I)**K - K*B(0))
END DO
END IF
END SUBROUTINE WORK
SUBROUTINE COLLECT !! writing txt files
IMPLICIT NONE
INTEGER I
IF(MYID .EQ. 1) THEN
CALL MPI_SEND(C,M,MPI_REAL,2,1,MPI_COMM_WORLD,IERR) !! tag value is 1
ELSE IF (MYID .EQ. 2) THEN
CALL MPI_RECV(C,M,MPI_REAL,1,1,MPI_COMM_WORLD,MPISTTS,IERR)
OPEN(UNIT=11,FILE="ARRAY.TXT",ACTION="WRITE")
DO I =1,M
WRITE(11,'(I2,2X,F4.1,2X,F4.1,2X,F4.1)') I, A(0),B(I),C(I)
END DO
CLOSE(11)
END IF
END SUBROUTINE COLLECT
SUBROUTINE INIT_RANDOM_SEED() !! I found this subroutine on online
IMPLICIT NONE
INTEGER :: I,N,CLOCK
INTEGER, DIMENSION(:), ALLOCATABLE :: SEED
CALL RANDOM_SEED(SIZE=N)
ALLOCATE(SEED(N))
CALL SYSTEM_CLOCK(COUNT=CLOCK)
SEED = CLOCK + 37 * (/ (I - 1, I = 1, N) /)
CALL RANDOM_SEED(PUT = SEED)
DEALLOCATE(SEED)
END SUBROUTINE INIT_RANDOM_SEED
END PROGRAM CRAFT
Upvotes: 5
Reputation: 6915
Look at this code:
SUBROUTINE INITIAL !!! random number input and BCAST
USE MPI
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
!! I'm not sure if a barrier is necessary or not here.
IF (MYID .EQ. 0) THEN
PRINT*, "ENTER A CONSTANT"
READ*, K
PRINT*, "HOW MANY TERMS?"
READ*, M
END IF
MPI rank 0 knows K
and M
and ranks 1-4 (you specified -np 5
) have an uninitialized K
and M
.
ALLOCATE(A(M),B(M),C(M))
CALL INIT_RANDOM_SEED() !!! see the very last subroutine
DO O =1,M
CALL RANDOM_NUMBER(U)
CALL RANDOM_NUMBER(V)
A(O) = U*10
B(O) = V*10
C(O) = 0.0
Now, all 5 ranks allocate A
, B
and C
of size M
, but MPI ranks 1-4 have an uninitialized M
and what happens after this departs from both the Fortran and MPI standards and anything goes. You need to broadcast M
to all of your ranks before using them. Correcting this will let your program run as you intended it, but...
You are writing new Fortran, and as such, it is best to form better habits about writing your code.
implicit none
. If you keep writing code this way, you'll eventually understand why everyone says this.MPI
. While what you have works, it is confusing because it at first it looks like you are using the Fortran90 MPI interface module mpi
, rather than your own module which includes the FORTRAN77 interface include mpif.h
.use mpi
or the Fortran2008 use mpi_f08
modules. You'll get (slightly) more robust protection from miscalling MPI procedures this way. It isn't perfect, but it is better.O
is an odd choice of an iterator, and in at least one place in your code you've written that O
as a 0
(bug!).M
and K
and you'll need do this at some point if you are going to be submitting your MPI jobs non-interactively (which is the normal case in an HPC setting).Upvotes: 3