Jonathan Engwall
Jonathan Engwall

Reputation: 59

runtime-lamboot suggests I automate ssh - Ubuntu

First off, I have keypairs, this is not a passphrase question though ssh is involved.

I also have MPICH, Hydra, SLURM and lamd ... this is a cluster computing question.

Node0 will boot but node1 gets hung. I have had this problem for days now. My nfs mirror works just fine and I can run Game Of Life on 8 cores on node2 ... that is really cool too, just ask me about it...

BUT, when I want to run on all three nodes together I hit a password request from each node as node0 uses ssh to send the processes. Again, not a passphrase problem, HYDRA (slurm and lamd as well) wants my user password from node1. Basically my login credential. I can change that to an MPICHuser account; however the dilemma will remain.

Unless I create MPICHusers on all three nodes without passwords at all ... can that be done? It seems like the epitome of security risk.

So the question is, can I automate the password credential whenever @ pops up in a way that won't hang lamboot?

It is late, looking at what I have makes me wonder if slurm is the new culprit.

Here is more or less what I am looking at:

me@wherever:/mirror/GameOfLife$ mpiexec.hydra -f /mirror/machinefile -n 10 ./life 10 10 30

[mpiexec@wherever] HYDU_process_mfile_token (utils/args/args.c:296): token node0 not supported at this time

[mpiexec@wherever] HYDU_parse_hostfile (utils/args/args.c:343): unable to process token

[mpiexec@wherever] mfile_fn (ui/mpich/utils.c:336): error parsing hostfile

[mpiexec@wherever] match_arg (utils/args/args.c:152): match handler returned error

[mpiexec@wherever] HYDU_parse_array (utils/args/args.c:174): argument matching returned error

[mpiexec@wherever] parse_args (ui/mpich/utils.c:1596): error parsing input array

[mpiexec@wherever] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1648): unable to parse user arguments

[mpiexec@wherever] main (ui/mpich/mpiexec.c:153): error parsing parameters me@wherever:/mirror/GameOfLife$

Upvotes: 0

Views: 235

Answers (1)

Jonathan Engwall
Jonathan Engwall

Reputation: 1

That is not the problem. I am looking toward Slurm comparability. Several things happen at nearly the same time in a specific order. The handler has to have terminal control in an instant so the master node can begin sending. Before I added Slurm the hydra machinefile was working but node0 could not "grab" the keyboard. Where should Slurm look for an equivalent file? I am wondering if I should remove hydra.

Upvotes: 0

Related Questions