Oli
Oli

Reputation: 1142

How to communicate between nodes of a cluster?

This isn't question about a specific cluster environment, but rather about the general case of distributing software over multiple nodes on a cluster.

I understand that most HPC clusters use some kind of workload manager to distribute jobs to multiple nodes. From my limited research Slurm seems to be a popular choice but others are also in use.

I can see how this is useful if you want to run n independent tasks. But what if you wanted to run tasks that communicate with one another?

If I were developing an application that was split across two or more machines I could just design a simple protocol (or use an existing one) and send/receive messages over something like TCP/IP. If things got really complicated it wouldn't be too hard to design a simple message bus or message hub to accommodate more than two machines.

Firstly, in an HPC cluster is it sensible to use TCP, or is this generally not used for performance reasons?

Secondly, in a non-cluster environment I know beforehand the IP addresses of the machines involved, but on a cluster, I delegate the decision of which physical machines my software is deployed on to a workload manager like Slurm. So how can I "wire up" the nodes? How does MPI achieve this, or is it not using TCP/IP to allow communication between nodes?

Sorry if this question is a little open-ended for StackOverflow, I'm happy to move it somewhere else if there's a more appropriate place to ask questions like these.

Upvotes: 1

Views: 2073

Answers (1)

PilouPili
PilouPili

Reputation: 2699

If I were developing an application that was split across two or more machines I could just design a simple protocol (or use an existing one) and send/receive messages over something like TCP/IP

And so there came MPI so not everyone would reinvent the wheel (and the wheel is several thousand hours of engineering time, it is not your basic chariot wheel, it has gone through some very bumpy roads...).
But eventually that's what MPI does (in the case where you want your communications to go through TCP see OpenMPI TCP)

Firstly, in an HPC cluster is it sensible to use TCP, or is this generally not used for performance reasons?

They are other means of communication than TCP (shared memory, Myrinet, OpenFabrics communications,...) OpenMPI FAQ). In HPC they are a few solutions on the market concerning Interconnect (look at Top 500)

So how can I "wire up" the nodes? How does MPI achieve this, or is it not using TCP/IP to allow communication between nodes?

The wiring is managed by the workload manager (you can have a look at slurm configuration or loadleveler). MPI will just "inherit" from that context because in a HPC context you stop using mpirun but more likely srun or runjob (instead of doing something like Specify the machines running program using MPI)

Upvotes: 2

Related Questions