philm
philm

Reputation: 885

How to execute MPI not as root when executing in /

I am running into a few issues with the SSH and MPICH executing. From some previous questions that I asked, I was able to progress to a point where I executed the mpi_hello.c program.

For reference, I am working on following this tutorial on setting up MPICH: https://help.ubuntu.com/community/MpichCluster

I created a directory in root called clusterFiles and I created a user on all of the nodes called clusterUser (clusteruser). I exported clusterFiles and I mounted clusterFiles in all of the nodes. Also, I changed ownership of clusterFiles to clusterUser on the master node. I also changed the home directory of clusterUser to be /clusterFiles.

I created an ssh key for clusterUser on the master node and I added the key to the authorized lists. I installed a keychain on the all nodes and on the master node I edited the .bashrc as specificed in the guide (I copied what was in the guide into .bashrc)

I also installed MPICH2 and GCC on all nodes.

I edited the machine file for my scpecific cluster.

However, when I go to execute the MPI hello_world.c program, this is where the errors occur.

I copied and pasted the code on the guide into a .c file and called it mpi_hello.c (This was done on the master node).

In the guide, the last part he just calls mpicc [arguments] and mpiexec [arguments]. However, when I go to call mpicc, I need to sudo mpicc [arguments]. Is this a problem I should be concerned with or would this be the proper way that it should it be done?

When I run mpiexec (without sudo), I recieve that following errors:

clusteruser@rgcluster2blade1:~$ mpiexec -n 7 -f machinefile ./mpi_hello

[mpiexec@rgcluster2blade1] HYDU_parse_hostfile (./utils/args/args.c:323): unable to open host file: machinefile

[mpiexec@rgcluster2blade1] mfile_fn (./ui/mpich/utils.c:341): error parsing hostfile

[mpiexec@rgcluster2blade1] match_arg (./utils/args/args.c:153): match handler returned error

[mpiexec@rgcluster2blade1] HYDU_parse_array (./utils/args/args.c:175): argument matching returned error

[mpiexec@rgcluster2blade1] parse_args (./ui/mpich/utils.c:1609): error parsing input array

[mpiexec@rgcluster2blade1] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:1660): unable to parse user arguments

[mpiexec@rgcluster2blade1] main (./ui/mpich/mpiexec.c:153): error parsing parameters

Are these files something that forgot to install? At first, I am thinking that I need sudo in front of mpiexec. So when I perform: sudo mpiexec [arguments] it "runs" but connects to the SSH cluster as root when I need it to connect as clusteruser.

My main concern is that he is not executing his commands as root. I am wondering if there is a step that is implied or at least there is a command that I was suppose to execute but didn't?

Also, I noticed that when I tried changing ownership of clusterFiles to clusterUser on the other nodes, I would get an operating not permitted error (I was root when I did this command). My thinking is that since I changed the ownership on the master node, it propagated to the other nodes since they have same username. So I was effectively changing the ownership to itself. Is this a correct thinking or is there more to it then that?

Edit:

From the suggestion of user Zulan, I have checked the permissions of the machinefile Interestingly enough, it is still set to rgcluster2blade1. I decided to run the command sudo chown -R clusteruser /clusterFiles in order to make all files/folders within clusterFiles to be owned by clusteruser. I have done this on the master node only. Will be checking the other nodes.

Edit 2:

Ok so after checking the rest of the cluster (I am only expermineting with 4 right now before doing the whole thing) I found that 2 of the nodes were giving permission to another user besides clusteruser. They were giving it to the user render. I attempted to perform sudo chown command but on both, I recieved an Operation not permitted error

Upvotes: 1

Views: 1937

Answers (1)

philm
philm

Reputation: 885

Just as an update. Since I discovered that the GID and UID are all messed up, I decided to delete the user and create a new account. Before doing anything, I made sure to check and, if needed, change the UID and GID of the users such that they are the same on all nodes. I cannot remember the command off the top of my head. Will look for it later. Once I find it, I will update this answer.

Anftwards, I proceeded with the guide and everything worked fine.

Upvotes: 1

Related Questions