Reputation: 11
I am trying to read a CFD mesh file through MPI-I/O. The file is a Fortran unformatted format with big-endianness, and it contains mixed variables of integer and real*8 (the file starts with block-size integers, followed by x,y,z coordinates of that block). I can manage to read first integers but the real entities are completely wrong or not so accurate. So I simplified the code to reproduce the same error. It writes one real value to the file in Fortran unformatted format and try to read it back from serially and in parallel (through MPI-I/O) :
program readtest
implicit none
include 'mpif.h'
integer :: myrank,nproc,ierr,istatus(MPI_STATUS_SIZE)
integer :: mpifile
integer :: rdsize
integer(kind=MPI_OFFSET_KIND) :: disp
character(len=80) :: mpifiname
double precision :: in,vals,valp
! Define MPI basics
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc,ierr)
! Initialize
in = 1.0/7.0
vals = 0.0
valp = 0.0
! Write a serial files
open(10,file='Serial.dat',form='unformatted')
write(10) in
close(10)
! Serial file read
open(10,file='Serial.dat',form='unformatted',status='old')
read(10) vals
close(10)
! Read by MPI-I/O
mpifiname = 'Serial.dat'
disp = 0
call MPI_FILE_OPEN(MPI_COMM_WORLD, mpifiname, &
MPI_MODE_RDONLY, &
MPI_INFO_NULL, mpifile, ierr)
call MPI_FILE_SET_VIEW(mpifile,disp,MPI_BYTE,MPI_BYTE,"external32",&
MPI_INFO_NULL,ierr)
rdsize = 0
if(myrank == 0) rdsize = 1
call MPI_FILE_READ_ORDERED(mpifile, valp, rdsize, MPI_DOUBLE_PRECISION, &
istatus, ierr)
call MPI_FILE_CLOSE(mpifile, ierr)
write(*,*) 'Input: ',in,'Serial:',vals,' Parallel:',valp
call MPI_FINALIZE(ierr)
stop
end
If you compile with the big-endian option (I add '-convert big_endian' option for Intel compiler), the result by Intel MPI slightly differs (It seems to be the byte-related problem):
mpirun -np 1 ./a.out
Input: 0.142857149243355 Serial: 0.142857149243355 Parallel:
0.142857074737549 (from Intel MPI)
Input: 0.142857149243355 Serial: 0.142857149243355 Parallel:
3.398201204542405E-312 (from OpenMPI)
If I abandon the big-endian mode (i.e., replace MPI_FILE_OPEN's data representation to "native" + set disp=4 to skip the first 4-byte record marker of Fortran unformatted format + no extra compilation flag), MPI-I/O reads exactly the same value. However, since the mesh file has been given in big-endian format, I have to keep using '-convert big_endian' option.
The use of HDF-5 also does not seem easy since the file format has been shared by other pre- and post-processing codes.
Anyone had the experience or know the remedy?
Best, Jeff
Upvotes: 1
Views: 1228
Reputation: 74355
While the default error handler for communication operations in MPI is MPI_ERRORS_ARE_FATAL
and hence the program is aborted if any kind of communication error happens, the default error handler for file I/O operations is MPI_ERRORS_RETURN
, which means that the program continues to execute and an error code is being returned. If you examine the value of ierr
after the call to MPI_FILE_SET_VIEW
, you would notice that with Open MPI it returns MPI_ERR_UNSUPPORTED_DATAREP
. The reason for that is that Open MPI ships with a version of ROM-IO that does not implement the external32
data representation.
As for the slightly wrong value of the floating point number when using Intel MPI: 0.142857149243355
in 64-bit IEEE 754 is 0x3FC24924A0000000
. The external32
representation of this number according to Intel MPI (as one could verify using MPI_PACK_EXTERNAL
) is:
A0 00 00 00 3F C2 49 24
This is simply not the IEEE 754 number in big endian storage. Rather it is a strange hybrid of big and little endian - the value is split in two halves and each one is stored in big endian but the lower half comes first as in little endian. Whether this is a bug in Intel's implementation of external32
or an actual quirk of the representation I cannot tell since the latter is very scarcely described in the MPI standard.
Your unformatted file looks probably like this when written on a big-endian machine:
00 00 00 08 3F C2 49 24 A0 00 00 00 00 00 00 08
----------- ^^^^^^^^^^^^^^^^^^^^^^^ -----------
reclen record value reclen
The first 8 bytes as read by MPI_FILE_READ_ORDERED
are 00 00 00 08 3F C2 49 24
. After Intel MPI converts those bytes back from external32
one obtains 0x3FC2492400000008
, which is 0.142857074737549
in 64-bit IEEE 754 representation.
Upvotes: 4