sigpwned
sigpwned

Reputation: 7453

Keeping a long-running task running on exactly one machine of a cluster with failover?

I have a low-CPU queue processing task that I need to keep running for a potentially long period of time. In case the task fails, I'd like to have the task running in a high-availability clustered environment, and the task should "switch" to another machine if the first machine fails. What is the best way to make sure that I have the task running on exactly one machine in the cluster at a time, with seamless failover on machine failure?

Right now, I'm planning to use JGroups to implement this feature. I'll keep one channel for each task, and only the channel leader will execute the task while the other members "follow along." Then, if the channel leader ever changes, the new channel leader picks up where the last one left off.

Has anyone used JGroups to solve this problem? What was your experience?

Upvotes: 1

Views: 211

Answers (1)

Nicholas
Nicholas

Reputation: 16066

You might get some inspiration and direction from the JBoss 4.2.3+ Clustered Singleton. The define a service that runs on one, and only one node in a cluster of nodes. If that node fails, or is ejected from the cluster, a new node is assigned the singleton. The underlying implementation [of JBoss Clustering] is JGroups.

Upvotes: 1

Related Questions