user2750720
user2750720

Reputation: 11

Using MPI-I/O to Read the Real Entity in an Fortran Unformatted File

I am trying to read a CFD mesh file through MPI-I/O. The file is a Fortran unformatted format with big-endianness, and it contains mixed variables of integer and real*8 (the file starts with block-size integers, followed by x,y,z coordinates of that block). I can manage to read first integers but the real entities are completely wrong or not so accurate. So I simplified the code to reproduce the same error. It writes one real value to the file in Fortran unformatted format and try to read it back from serially and in parallel (through MPI-I/O) :

program readtest
implicit none
include 'mpif.h'

   integer :: myrank,nproc,ierr,istatus(MPI_STATUS_SIZE)
   integer :: mpifile
   integer :: rdsize
   integer(kind=MPI_OFFSET_KIND) :: disp
   character(len=80) :: mpifiname
   double precision :: in,vals,valp

! Define MPI basics
   call MPI_INIT(ierr)
   call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc,ierr)

! Initialize
   in = 1.0/7.0
   vals = 0.0
   valp = 0.0

! Write a serial files
   open(10,file='Serial.dat',form='unformatted')
   write(10) in
   close(10)

! Serial file read
   open(10,file='Serial.dat',form='unformatted',status='old')
   read(10) vals
   close(10)

! Read by MPI-I/O
   mpifiname = 'Serial.dat'

   disp = 0
   call MPI_FILE_OPEN(MPI_COMM_WORLD, mpifiname, &
                      MPI_MODE_RDONLY, &
                      MPI_INFO_NULL, mpifile, ierr)
   call MPI_FILE_SET_VIEW(mpifile,disp,MPI_BYTE,MPI_BYTE,"external32",&
                          MPI_INFO_NULL,ierr)
   rdsize = 0
   if(myrank == 0) rdsize = 1
   call MPI_FILE_READ_ORDERED(mpifile, valp, rdsize, MPI_DOUBLE_PRECISION, &
                               istatus, ierr)
   call MPI_FILE_CLOSE(mpifile, ierr)

   write(*,*) 'Input: ',in,'Serial:',vals,' Parallel:',valp

   call MPI_FINALIZE(ierr)

stop
end

If you compile with the big-endian option (I add '-convert big_endian' option for Intel compiler), the result by Intel MPI slightly differs (It seems to be the byte-related problem):

mpirun -np 1 ./a.out

 Input:   0.142857149243355      Serial:  0.142857149243355       Parallel:
  0.142857074737549 (from Intel MPI)
 Input:   0.142857149243355      Serial:  0.142857149243355       Parallel:
  3.398201204542405E-312 (from OpenMPI)

If I abandon the big-endian mode (i.e., replace MPI_FILE_OPEN's data representation to "native" + set disp=4 to skip the first 4-byte record marker of Fortran unformatted format + no extra compilation flag), MPI-I/O reads exactly the same value. However, since the mesh file has been given in big-endian format, I have to keep using '-convert big_endian' option.

The use of HDF-5 also does not seem easy since the file format has been shared by other pre- and post-processing codes.

Anyone had the experience or know the remedy?

Best, Jeff

Upvotes: 1

Views: 1228

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74355

While the default error handler for communication operations in MPI is MPI_ERRORS_ARE_FATAL and hence the program is aborted if any kind of communication error happens, the default error handler for file I/O operations is MPI_ERRORS_RETURN, which means that the program continues to execute and an error code is being returned. If you examine the value of ierr after the call to MPI_FILE_SET_VIEW, you would notice that with Open MPI it returns MPI_ERR_UNSUPPORTED_DATAREP. The reason for that is that Open MPI ships with a version of ROM-IO that does not implement the external32 data representation.

As for the slightly wrong value of the floating point number when using Intel MPI: 0.142857149243355 in 64-bit IEEE 754 is 0x3FC24924A0000000. The external32 representation of this number according to Intel MPI (as one could verify using MPI_PACK_EXTERNAL) is:

A0 00 00 00 3F C2 49 24

This is simply not the IEEE 754 number in big endian storage. Rather it is a strange hybrid of big and little endian - the value is split in two halves and each one is stored in big endian but the lower half comes first as in little endian. Whether this is a bug in Intel's implementation of external32 or an actual quirk of the representation I cannot tell since the latter is very scarcely described in the MPI standard.

Your unformatted file looks probably like this when written on a big-endian machine:

00 00 00 08 3F C2 49 24 A0 00 00 00 00 00 00 08
----------- ^^^^^^^^^^^^^^^^^^^^^^^ -----------
  reclen         record value          reclen

The first 8 bytes as read by MPI_FILE_READ_ORDERED are 00 00 00 08 3F C2 49 24. After Intel MPI converts those bytes back from external32 one obtains 0x3FC2492400000008, which is 0.142857074737549 in 64-bit IEEE 754 representation.

Upvotes: 4

Related Questions