Reputation: 59
First off, I have keypairs, this is not a passphrase question though ssh is involved.
I also have MPICH, Hydra, SLURM and lamd ... this is a cluster computing question.
Node0 will boot but node1 gets hung. I have had this problem for days now. My nfs mirror works just fine and I can run Game Of Life on 8 cores on node2 ... that is really cool too, just ask me about it...
BUT, when I want to run on all three nodes together I hit a password request from each node as node0 uses ssh to send the processes. Again, not a passphrase problem, HYDRA (slurm and lamd as well) wants my user password from node1. Basically my login credential. I can change that to an MPICHuser account; however the dilemma will remain.
Unless I create MPICHusers on all three nodes without passwords at all ... can that be done? It seems like the epitome of security risk.
So the question is, can I automate the password credential whenever @ pops up in a way that won't hang lamboot?
It is late, looking at what I have makes me wonder if slurm is the new culprit.
Here is more or less what I am looking at:
me@wherever:/mirror/GameOfLife$ mpiexec.hydra -f /mirror/machinefile -n 10 ./life 10 10 30
[mpiexec@wherever] HYDU_process_mfile_token (utils/args/args.c:296): token node0 not supported at this time
[mpiexec@wherever] HYDU_parse_hostfile (utils/args/args.c:343): unable to process token
[mpiexec@wherever] mfile_fn (ui/mpich/utils.c:336): error parsing hostfile
[mpiexec@wherever] match_arg (utils/args/args.c:152): match handler returned error
[mpiexec@wherever] HYDU_parse_array (utils/args/args.c:174): argument matching returned error
[mpiexec@wherever] parse_args (ui/mpich/utils.c:1596): error parsing input array
[mpiexec@wherever] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1648): unable to parse user arguments
[mpiexec@wherever] main (ui/mpich/mpiexec.c:153): error parsing parameters me@wherever:/mirror/GameOfLife$
Upvotes: 0
Views: 235
Reputation: 1
That is not the problem. I am looking toward Slurm comparability. Several things happen at nearly the same time in a specific order. The handler has to have terminal control in an instant so the master node can begin sending. Before I added Slurm the hydra machinefile was working but node0 could not "grab" the keyboard. Where should Slurm look for an equivalent file? I am wondering if I should remove hydra.
Upvotes: 0