Shay
Shay

Reputation: 474

Schedule YARN application on active/standby nodes

I would like to have a cluster that is split to 2 sub-clusters: "active" nodes and "standby" nodes. Normally, when an application is scheduled I would like it to run on the "active" nodes. But if no "active" node is healthy, I would like it to run on the "standby" nodes.

Is there a way to achieve such behavior in YARN?

To give a bit more details, the "active" nodes of the cluster will be located in different zone than the the "standby" nodes (but not so far from them). Thus we try to achieve multi-zone high availability for our application. Meaning, upon disaster in the "active" zone, the application will be recovered and scheduled on the "standby" zone.

Upvotes: 0

Views: 175

Answers (1)

tk421
tk421

Reputation: 5957

To route jobs to specific nodes, you will need Node Labels. Capacity Scheduler has had them for a while (2.6 or earlier), but for Fair Scheduler I think they were planning on supporting them in Hadoop 3.x.

Another option to consider is YARN federation where you have more than one YARN cluster so your 2nd would be in zone 2 and you can re-route your job to zone 2 if zone 1 has issues.

References

Upvotes: 1

Related Questions