Reputation: 1032
I try to test a fortran MPI program to see how much speed I can gain from MPI respect serial programming.
I first compile the program with:
mpif90 mpi_pi_reduce.f -o mpi_pi
and then, because I had problem in localize the mpirun
I launch the program as follow:
/usr/bin/mpirun -np 4 ./mpi_pi
Now, with np=1 I obtain:
real 0m0.063s
user 0m0.049s
sys 0m0.017s
while if I use np=4 I obtain:
real 0m1.139s
user 0m0.352s
sys 0m0.045s
Which is unrealistic :-( !
Is it possible that, using /usr/bin/mpirun
, the MPI doesn't work properly? I do not touch the example code so the problem can not be the program itself.
I have ifort:
ifort (IFORT) 14.0.1 20131008
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
and gfortran:
GNU Fortran (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
Finally:
/usr/bin/mpirun -V
mpirun (Open MPI) 1.4.5
The error I had, using only mpirun
command, was:
/opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpirun: 96: .: Can't open /opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpivars.sh
This is the reason why I use the /usr/bin/mpirun
to launch the code as suggested here.
Many thanks for your help.
Upvotes: 1
Views: 261
Reputation: 1593
Your test takes 0m0.063s with a single core !
You will never be able to get any reasonable timings with such a short benchmark : communications are expensive, typically 1 microsecond for a one-sided interprocess communication on the same host, whereas a floating-point operation is in the order of the nanosecond. If you add the time you wait in barriers, etc, you see that inter-process communication is not at all the same granularity as in shared memory.
By increasing the ROUNDS variable in your program, you should try to target benchmarks with at least 10 seconds for the fastest run in order to eliminate the time spent in initialization and finalization.
Note that MPI_REDUCE is an expensive call, where the time it takes increases with the total number of processes (as opposed to MPI_SEND for example). You should move it outside of the loop to make much more computations than communication.
If your goal is not learning MPI but parallelizing a Monte Carlo code (or some "embarrassingly parallel" code), you should have a look at the ZeroMQ library (http://zeromq.org) which has bindings in many languages including Fortran. With this library you will get fault tolerance (if one process crashes, your run continues), and the possibility to have flexible resources (you can attach and detach clients whenever you want). This is very useful because you don't need to wait for all the resources to be free on the cluster before your calculation starts. Just submit multiple jobs that connect to the same server! You can check out these slides : http://irpf90.ups-tlse.fr/files/parallel_programming.pdf where you have a client/server implementation of Pi using pipes, sockets and XML/RPC. You can do the same using Fortran and ZeroMQ with no effort.
Upvotes: 5
Reputation: 1287
Since you only have one error message to work with you will want to find out why it can't open /opt/intel/composer_xe_2013_sp1.1.106/mpirt/bin/intel64/mpivars.sh
.
Check this file exists and check the permissions on it allow it to be executed by whatever user is running the process.
Upvotes: 1