Reputation: 757
How do I use mpirun
's -machine
flag?
To select which cluster node to execute on, I figured out to use mpirun
's -machinefile
option like this
> mpirun -machinefile $HOME/utils/Host_file -np <integer> <executable-filename>
Host_file
contains a list of the nodes, one on each line.
But I want to submit a whole bunch of processes with different arguments and I don't want them running on the same node. That is, I want to do something like
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 1
nano Host_file % change the first node name
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 2
nano Host_file
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 3
nano Host_file
...
I could use the -machine
flag and then just type a different node for each execution. But I can't get it to work. For example
> mpirun -machine node21-ib -np 1 FPU
> mpirun -machine node21 -np 1 FPU
always executes on the master node.
I also tried the -nodes
option
> mpirun -nodes node21-ib -np 1 FPU
> mpirun -nodes node21 -np 1 FPU
But that just executes on my current node.
Similarly, I've tried the -nolocal
and -exclude
options without success.
So I have a simple question: How do I use the -machine
option? Or is there a better way to do this (for a Linux newbie)?
I'm using the following version of MPI, which seems to have surprisingly little documentation on the web (so far, the entirety of the documentation I have comes from > mpirun --help
).
> mpichversion
MPICH Version: 1.2.7
MPICH Release date: $Date: 2005/06/22 16:33:49$
MPICH Patches applied: none
MPICH configure: --with-device=ch_gen2 --with-arch=LINUX -prefix=/usr/local/mvapich-gcc --with-romio --without-mpe -lib=-L/usr/lib64 -Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread
MPICH Device: ch_gen2
Thanks for your help.
Upvotes: 2
Views: 12506
Reputation: 12361
What you need is to specific a hosts file
for example at your mpirun
command try mpirun -np 4 -hostfile hosts ./exec
where hosts contains your ip address generally 192.168.1.201:8
where the last digit is the maximum number of cores, separate each node by a newline. Ideally you should install some cluster management software like torque and maui for example.
Upvotes: 1