Cagatay
Cagatay

Reputation: 1

Creating HDF5 File and datasets with OpenMPI

I need to write my HDF5 datasets to one HDF5 file in parallel and I want to create my file just with one thread and to do this I can use if statements like:

if( currentThread == 0)
{
    createHDF5File( );
}

But I don't know which thread will come first. For example, when thread 1 comes first then it tries to write a dataset to a non-existing file. Is there any way to select the first thread? Or is there any better way to do this?

Upvotes: 0

Views: 717

Answers (2)

Timothy Brown
Timothy Brown

Reputation: 2280

It sounds like you really should be using parallel IO with HDF5. HDF5 is capable of using MPI-IO (under the hood), if you built it with parallel support.

Here's a sample program (in Fortran).

! Program to use MPI_Cart and Parallel HDF5
!
program hdf_pwrite

        use, intrinsic :: iso_c_binding, only: c_double
        use mpi
        use hdf5
        use kinds, only : r_dp

        implicit none

        ! external interface
        interface
                subroutine get_walltime(t) &
                                bind(c, name="get_walltime")
                                import :: c_double
                                real(kind=c_double), intent(out) :: t
                end subroutine get_walltime
        end interface

        ! Local 4000x4000 with a 1x1 halo
        integer, parameter :: ndims = 2
        integer, parameter :: N     = 4000
        integer, parameter :: halo  = 1

        integer :: argc                        ! Command line args
        integer :: ierr                        ! Error status
        integer :: id                          ! My rank/ID
        integer :: np                          ! Number of processors
        integer :: iunit                       ! File descriptor
        integer :: i,j                         ! Loop indexers
        integer :: total                       ! Total dimension size
        integer :: lcount                      ! Luster count size
        integer :: lsize                       ! Lustre stripe size
        character(len=1024) :: clcount, clsize ! Strings of LFS
        integer :: info                        ! MPI IO Info
        integer :: m_dims(ndims)               ! MPI cart dims
        integer :: coords(ndims)               ! Co-ords of procs in the grid
        logical :: is_periodic(ndims)          ! Periodic boundary conditions
        logical :: reorder                     ! Reorder the MPI structure
        integer :: MPI_COMM_2D                 ! New communicator

        integer(KIND=MPI_OFFSET_KIND) :: offset

        character(len=1024) :: filename
        integer(kind=hid_t) :: p_id, f_id, x_id, d_id
        integer(kind=hid_t) :: memspace, filespace
        ! Local hyper slab info
        integer(kind=hsize_t) :: d_size(ndims), s_size(ndims), h_size(ndims),&
                                 stride(ndims), block(ndims)
        ! Global hyper slab info
        integer(kind=hsize_t) :: g_size(ndims), g_start(ndims)

        real(kind=r_dp), allocatable :: ld(:,:)
        ! Timing vars
        real(kind=r_dp) :: s, e, dt, mdt

        argc = 0
        ierr = 0
        offset = 0
        m_dims = (/ 0, 0/)
        is_periodic = .false.      ! Non-periodic
        reorder     = .false.      ! Not allowed to reorder

        call mpi_init(ierr)

        ! Set up the MPI cartesian topology
        call mpi_comm_size(MPI_COMM_WORLD, np, ierr)
        call mpi_dims_create(np, ndims, m_dims, ierr)

        call mpi_cart_create(MPI_COMM_WORLD, ndims, m_dims, is_periodic, &
                             reorder, MPI_COMM_2D, ierr)
        call mpi_comm_rank(MPI_COMM_2D, id, ierr)
        call mpi_cart_coords(MPI_COMM_2D, id, ndims, coords, ierr)

        if (id .eq. 0) then
                if (mod(N,np) .ne. 0) then
                        write(0,*) 'Must use divisiable number of procs.'
                        call mpi_abort(MPI_COMM_WORLD, 1, ierr)
                endif

                ! get the filename
                argc = iargc()
                if (argc .lt. 1 ) then
                        write(0, *) 'Must supply a filename'
                        call exit(1)
                endif
                call get_command_argument(1, filename)
        endif

        ! Broadcast the filename
        call mpi_bcast(filename, len(filename), MPI_CHAR, 0, &
                       MPI_COMM_WORLD, ierr)

        ! Init the HDF5 library
        call h5open_f(ierr)

        ! Set a stripe count of 4 and a stripe size of 4MB
        lcount = 4
        lsize  = 4 * 1024 * 1024
        write(clcount, '(I4)') lcount
        write(clsize, '(I8)') lsize

        call mpi_info_create(info, ierr)
        call mpi_info_set(info, "striping_factor", trim(clcount), ierr)
        call mpi_info_set(info, "striping_unit", trim(clsize), ierr)

        ! Set up the access properties
        call h5pcreate_f(H5P_FILE_ACCESS_F, p_id, ierr)
        call h5pset_fapl_mpio_f(p_id, MPI_COMM_2D, info, ierr)

        ! Open the file
        call h5fcreate_f(filename, H5F_ACC_TRUNC_F, f_id, ierr, &
                         access_prp = p_id)
        if (ierr .ne. 0) then
                write(0,*) 'Unable to open: ', trim(filename), ': ', ierr
                call mpi_abort(MPI_COMM_WORLD, 1, ierr)
        endif

        ! Generate our 4000x4000 matrix with a 1x1 halo
        total = N + 2 * halo
        allocate(ld(0:total-1, 0:total-1))

        ld = -99.99
        ! init the local data
        do j = 1, N
                do i = 1, N
                        ld(i,j) = (i - 1 + (j-1)*N)
                enddo
        enddo

        ! Create the local memory space and hyperslab
        do i = 1, ndims
                d_size(i) = total
                s_size(i) = N
                h_size(i) = halo
                stride(i) = 1
                block(i)  = 1
        enddo

        call h5screate_simple_f(ndims, d_size, memspace, ierr)
        call h5sselect_hyperslab_f(memspace, H5S_SELECT_SET_F, &
                                   h_size, s_size, ierr,       &
                                   stride, block)

        ! Create the global file space and hyperslab
        do i = 1, ndims
                g_size(i)  = N * m_dims(i)
                g_start(i) = N * coords(i)
        enddo

        call h5screate_simple_f(ndims, g_size, filespace, ierr)
        call h5sselect_hyperslab_f(filespace, H5S_SELECT_SET_F, &
                                   g_start, s_size, ierr,       &
                                   stride, block)

        ! Create a data transfer property
        call h5pcreate_f(H5P_DATASET_XFER_F, x_id, ierr)
        call h5pset_dxpl_mpio_f(x_id, H5FD_MPIO_COLLECTIVE_F, ierr)

        ! Create the dataset id
        call h5dcreate_f(f_id, "/data", H5T_IEEE_F64LE, filespace, d_id, &
                         ierr)


        ! Write the data
        call get_walltime(s)
        call h5dwrite_f(d_id, H5T_NATIVE_DOUBLE, ld, s_size, ierr,      &
                        file_space_id=filespace, mem_space_id=memspace, &
                        xfer_prp=x_id)
        call get_walltime(e)

        dt = e - s
        call mpi_reduce(dt, mdt, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_2D, ierr)

        if (id .eq. 0) then
                write(6,*) mdt / np
        endif

        if (allocated(ld)) then
                deallocate(ld)
        endif

        ! Close everything and exit
        call h5dclose_f(d_id, ierr)
        call h5sclose_f(filespace, ierr)
        call h5sclose_f(memspace, ierr)
        call h5pclose_f(x_id, ierr)
        call h5pclose_f(p_id, ierr)
        call h5fclose_f(f_id, ierr)
        call h5close_f(ierr)

        call mpi_finalize(ierr)
end program hdf_pwrite

Please note this is my teaching example that I interactively get the class to play with. So there are a few different things in it.

  1. I introduce the iso_c_binding as we have a timing routine in C (gettimeofday) wrapper.
  2. I use MPI topologies.
  3. The root rank is the only one that processes the filename to write, and then we broadcast this to all ranks.
  4. We set a stripe count and size for the lustre file system.
  5. Use hyper slabs for the data placement.
  6. Use MPI IO collective call.

Hope this helps.

Upvotes: 1

Tomas Aschan
Tomas Aschan

Reputation: 60584

Do you calculate the data you want to write in parallel? If so, you want to make sure that all workers have finished their processing before you write, so that your data is in fact complete.

In other words,

// Collect all the data using some form of MPI_Collect, MPI_Reduce
// or whatevs. I'll just put this here for proof-of-concept
MPI_Barrier();

// Now, all the threads have "joined", so you can write from 0 without worrying
// that some other thread got here way before
if (currentThread == 0) { createdHDF5File(); }

If not, I assume you want to write data from each thread. Why not just write it to different files?

// Calculate stuff on each thread
// Then write to different files depending on thread num
createHDF5File(currentThread); // Chooses file name that includes the thread num

Upvotes: 0

Related Questions