Load distribution to instances of a perl script running on each node of a cluster

Question

I have a perl script (call it worker) installed on each node/machine (4 total) of a cluster (each running RHEL). The script itself is configured as a RedHat Cluster service (which means the RH cluster manager would ensure that one and exactly one instance of this script is running as long as at least one node in the cluster is up).

I have X amount of work to be done every day once a day, which this script does. So far the X was small enough and only one instance of this script was enough to do it. But now the load is going to increase and along with High Availability (viz already implemented using RHCS), I also need load distribution.

Question is how do I do that?

Of course I have a way to split the work in n parts of size X/n each. Options I had in mind:

Create a new load distributor, which splits the work in jobs of X/n. AND one of the following:

Create a named pipe on the network file system (which is mounted and visible on all nodes), post all jobs to the pipe. Make each worker script on each node read (atomic) from the pipe and do the work. OR
Make each worker script on each node listen on a TCP socket and the load distributor send jobs to each this socket in a round robin (or some other algo) fashion.

Theoretical problem with #1 is that we've observed some nasty latency problems with NFS. And I'm not even sure if NFS would support IPC via named pipes across machines.

Theoretical problem with #2 is that I have to implement some monitors to ensure that each worker is running and listening, which being a noob to Perl, I'm not sure if is easy enough.

I personally prefer load distributor creating a pool and workers pulling from it, rather than load distributor tracking each worker and pushing work to each. Any other options?

I'm open to new ideas as well. :)

Thanks!

-- edit --

using Perl 5.8.8, to be precise: This is perl, v5.8.8 built for x86_64-linux-thread-multi

mpeters · Accepted Answer

If you want to keep it simple use a database to store the jobs and then have each worker lock the table and get the jobs they need then unlock and let the next worker do it's thing. This isn't the most scalable solution since you'll have lock contention, but with just 4 nodes it should be fine.

But if you start going down this road it might make sense to look at a dedicated job-queue system like Gearman.

Load distribution to instances of a perl script running on each node of a cluster

Answers (1)

Related Questions