user1048419
user1048419

Reputation: 49

2D FFTW with MPI is too slow

I want to do 2D FFTW with MPI. I just followed the code shown in the answer of the previous post How to do a fftw3 MPI "transposed" 2D transform if possible at all?.

However, the performance is really bad. In fact, the execution with 2 processors is slower than that with 1 processor. What is wrong with it?

program trashingfftw
  use, intrinsic :: iso_c_binding
  use MPI

  implicit none
  include 'fftw3-mpi.f03'

  integer(C_INTPTR_T), parameter :: L = 256
  integer(C_INTPTR_T), parameter :: M = 256

  type(C_PTR) :: plan, ctgt, csrc

  complex(C_DOUBLE_COMPLEX), pointer :: src(:,:)
  real(8), pointer :: tgt(:,:)

  integer(C_INTPTR_T) :: alloc_local, local_M, &
                         & local_L,local_offset1,local_offset2

  integer :: ierr,id


  call mpi_init(ierr)

  call mpi_comm_rank(MPI_COMM_WORLD,id,ierr)

  call fftw_mpi_init()


  alloc_local = fftw_mpi_local_size_2d(M,L/2+1, MPI_COMM_WORLD, &
       local_M, local_offset1)

  csrc = fftw_alloc_complex(alloc_local)
  call c_f_pointer(csrc, src, [L/2,local_M])


  alloc_local = fftw_mpi_local_size_2d(2*(L/2+1),M, MPI_COMM_WORLD, &
       &                               local_L, local_offset2)

  ctgt = fftw_alloc_real(alloc_local)
  call c_f_pointer(ctgt, tgt, [M,local_L])

  plan =  fftw_mpi_plan_dft_c2r_2d(M,L,src,tgt, MPI_COMM_WORLD, & 
       ior(FFTW_MEASURE, FFTW_MPI_TRANSPOSED_OUT))

  call fftw_mpi_execute_dft_c2r(plan, src, tgt)

  call mpi_finalize(ierr)


end program trashingfftw

Upvotes: 2

Views: 382

Answers (1)

Paul R
Paul R

Reputation: 213170

Note that plan creation typically takes much longer than the FFT itself, particularly if you use FFTW_MEASURE or greater.

Ideally you should just create the plan once and then run multiple FFTs using this same plan.

For benchmarking you should just measure the FFT time, and not the plan creation time

Upvotes: 2

Related Questions