Fornax-A
Fornax-A

Reputation: 1032

serial program faster then parallel fortran

I try to test a fortran MPI program to see how much speed I can gain from MPI respect serial programming.

I first compile the program with:

mpif90 mpi_pi_reduce.f -o mpi_pi

and then, because I had problem in localize the mpirun I launch the program as follow:

/usr/bin/mpirun -np 4 ./mpi_pi

Now, with np=1 I obtain:

real    0m0.063s
user    0m0.049s
sys     0m0.017s

while if I use np=4 I obtain:

real    0m1.139s
user    0m0.352s
sys     0m0.045s

Which is unrealistic :-( ! Is it possible that, using /usr/bin/mpirun, the MPI doesn't work properly? I do not touch the example code so the problem can not be the program itself. I have ifort:

ifort (IFORT) 14.0.1 20131008 Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

and gfortran:

GNU Fortran (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1 Copyright (C) 2013 Free Software Foundation, Inc.

Finally:

/usr/bin/mpirun -V mpirun (Open MPI) 1.4.5

The error I had, using only mpirun command, was:

/opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpirun: 96: .: Can't open /opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpivars.sh

This is the reason why I use the /usr/bin/mpirun to launch the code as suggested here.

Many thanks for your help.

Upvotes: 1

Views: 261

Answers (2)

Anthony Scemama
Anthony Scemama

Reputation: 1593

Your test takes 0m0.063s with a single core !

You will never be able to get any reasonable timings with such a short benchmark : communications are expensive, typically 1 microsecond for a one-sided interprocess communication on the same host, whereas a floating-point operation is in the order of the nanosecond. If you add the time you wait in barriers, etc, you see that inter-process communication is not at all the same granularity as in shared memory.

By increasing the ROUNDS variable in your program, you should try to target benchmarks with at least 10 seconds for the fastest run in order to eliminate the time spent in initialization and finalization.

Note that MPI_REDUCE is an expensive call, where the time it takes increases with the total number of processes (as opposed to MPI_SEND for example). You should move it outside of the loop to make much more computations than communication.

If your goal is not learning MPI but parallelizing a Monte Carlo code (or some "embarrassingly parallel" code), you should have a look at the ZeroMQ library (http://zeromq.org) which has bindings in many languages including Fortran. With this library you will get fault tolerance (if one process crashes, your run continues), and the possibility to have flexible resources (you can attach and detach clients whenever you want). This is very useful because you don't need to wait for all the resources to be free on the cluster before your calculation starts. Just submit multiple jobs that connect to the same server! You can check out these slides : http://irpf90.ups-tlse.fr/files/parallel_programming.pdf where you have a client/server implementation of Pi using pipes, sockets and XML/RPC. You can do the same using Fortran and ZeroMQ with no effort.

Upvotes: 5

HenryTK
HenryTK

Reputation: 1287

Since you only have one error message to work with you will want to find out why it can't open /opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpivars.sh.

Check this file exists and check the permissions on it allow it to be executed by whatever user is running the process.

Upvotes: 1

Related Questions