Sudhir Tupe
Sudhir Tupe

Reputation: 13

How to achieve fault tolerance(Recovery) with TaskMangers of Apache-Flink?

Recovery with JobManager is achieved using Zookeeper, but what if TaskManager gets failed? How to recover from this, does JobManager automatically recovers TaskManagers?

Upvotes: 1

Views: 556

Answers (1)

Fabian Hueske
Fabian Hueske

Reputation: 18997

In general, the JobManager takes care to recover from TaskManager failures. How this is done depends on your setup.

  • If you run Flink on YARN, the JobManager will start a new TaskManager when it realizes that a TaskManager has died and reassign tasks.
  • If you run Flink stand-alone on a cluster, you have to make sure you have one (or more) stand-by TaskManager(s) running. The JobManager will assign the tasks of the failed TM to a stand-by TM. This also means that you have to ensure that enough stand-by TMs are up and running.

Upvotes: 3

Related Questions