Reputation: 1
I'm working on small Raspberry PI cluster, my host-program creates IP packet fragments and sends them to multiple relay-programes. Relays receive those packet fragments and forward them to destination using raw sockets. Because of raw sockets my relay-programes must run with sudo permission. My setup involves RPi 3 B v2 and RPi 2 B v1. SSH is already set up, nodes can SSH-in without password, although I must run ssh-agent and ssh-add my keys on each node. I've managed to run program sending rank from one node to another(2 different RPis). I run MPI programs in MPMD-way, since I have only 2 RPis I run host and relay on node #1 and relay on node #2. Host program takes path to file to be sent as command line argument.
If I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 /home/pi/Desktop/relay
it runs, but obviously program fail because relays can't open raw sockets without sudo permission.
If I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo /home/pi/Desktop/relay
relays report world size: 1 and host program hangs.
If I run:
mpirun --oversubscribe -n 1 --host localhost sudo /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo /home/pi/Desktop/relay
all relays and host reports world size 1.
I found similar problem here: OpenMPI / mpirun or mpiexec with sudo permission
Following short answer I run:
mpirun --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,rpi2 sudo -E /home/pi/Desktop/relay
which results in:
[raspberrypi:00979] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
[raspberrypi:00980] OPAL ERROR: Unreachable in file ext2x_client.c at line 109
*** An error occurred in MPI_Init
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[raspberrypi:00979] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[raspberrypi:00980] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[32582,1],1]
Exit code: 1
--------------------------------------------------------------------------
I've run sudo visudo and my file on both nodes looks like that:
# User privilege specification
root ALL=(ALL:ALL) ALL
pi ALL = NOPASSWD:SETENV: /etc/alternatives/mpirun
pi ALL=NOPASSWD:SETENV: /usr/bin/orterun
pi ALL=NOPASSWD:SETENV: /usr/bin/mpirun
When I run everything on one node it just works:
sudo mpirun --alow-run-as-root --oversubscribe -n 1 --host localhost /home/pi/Desktop/host /some.jpeg : -n 2 --host localhost,localhost /home/pi/Desktop/relay
//host
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_size = []() {
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
return size;
}();
int id = []() {
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
return id;
}();
if (argc != 2) {
std::cerr << "Filepath not passed\n";
MPI_Finalize();
return 0;
}
const std::filesystem::path filepath(argv[1]);
if (not std::filesystem::exists(filepath)) {
std::cerr << "File doesn't exist\n";
MPI_Finalize();
return 0;
}
std::cout << "World size: " << world_size << '\n';
MPI_Finalize();
return 0;
}
//relay
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_size = []() {
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
return size;
}();
int id = []() {
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
return id;
}();
std::cout << "World size: " << world_size << '\n';
MPI_Finalize();
return 0;
}
How do I configure nodes to allow them to run MPI programs with sudo?
Upvotes: 0
Views: 1457
Reputation: 1
The easiest way to resolve the problem is to set capabilities of file, it still pose security problem, but it's not as serious as setting suid of program to root. To set capabilities of program allowing to open raw socket: setcap program cap_net_raw,cap_net_admin+eip
.
Upvotes: 0