Ali
Ali

Reputation: 41

Using SLURM to run TCP client, server

I have a Docker image that needs to be run in an environment where I have no admin privileges, using Slurm 17.11.8 in RHEL. I am using udocker to run the container.

In this container, there are two applications that needs to run:

[1] ROS simulation (there is a rosnode that is a TCP client talking to [2])

[2] An executable (TCP server)

So [1] and [2] needs to run together and they shared some common files as well. Usually, I run them in separate terminals. But I have no idea how to do this with SLURM.

Possible Solution:

(A) Use two containers of the same image, but their files will be stored locally. Could use volumes instead. But this requires me to change my code significantly and maybe break compatibility when I am not running it as containers (e.g in Eclipse).

(B) Use a bash script to launch two terminals and run [1] and [2]. Then srun this script.

I am looking at (B) but have no idea how to approach it. I looked into other approaches but they address sequential executions of multiple processes. I need these to be concurrent.

If it helps, I am using xfce-terminal though I can switch to other terminals such as Gnome, Konsole.

Upvotes: 0

Views: 452

Answers (1)

chuck
chuck

Reputation: 745

This is a shot in the dark since I don't work with udocker.

In your slurm submit script, to be submitted with sbatch, you could allocate enough resources for both jobs to run on the same node(so you just need to reference localhost for your client/server). Start your first process in the background with something like:

udocker container_name container_args &

The & should start the first container in the background.

You would then start the second container:

udocker 2nd_container_name more_args

This would run without & to keep the process in the foreground. Ideally, when the second container completes the script would complete and slurm cleanup would kill the first container. If both containers will come to an end cleanly you can put a wait at the end of the script.

Caveats:

  • Depending on how Slurm is configured, processes may not be properly cleaned up at the end. You may need to capture the PID of the first udocker as a variable and kill it before you exit.
  • The first container may still be processing when the second completes. You may need to add a sleep command at the end of your submission script to give it time to finish.
  • Any number of other gotchas may exist that you will need to find and hopefully work around.

Upvotes: 1

Related Questions