Pasticho
Pasticho

Reputation: 1

Automating apache Flink Taskmanager

Currently I have a very busy server and the Flink is fighting with other applications for resources, this causes the Taskmanager to crash and all the jobs to go into a Fail state. I usually just restart the Taskmanager.sh and it starts working until it crashes again.

I know that the solution is to get more resources and that's coming, but until then, I need help to understand the best way to automate the Taskmanager.sh to restart automatically when it crashes.

I've searched online, and have found two options 1.- Create a script that then gets added to a cronjob. 2.- Copy the Taskmanager.sh to the systemd and edit the file to restart it when it senses it's down (https://www.cyberciti.biz/faq/how-to-restart-a-process-out-of-crontab-on-a-linuxunix/)

I have not done scripts before, so before I tried to delve into it, wanted to know if any of these options are doable with the Flink's Taskmanager, or if there's an easier solution?

System: RHEL

Thanks!

Manually restarting the Taskmanager.sh works but crashes not even 5mins later.

Upvotes: 0

Views: 106

Answers (1)

David Anderson
David Anderson

Reputation: 43717

The standard approach to this is to deploy Flink with Kubernetes or Yarn, in which case this is handled automatically. But setting up the taskmanager as a service managed by systemd should be workable.

Upvotes: 0

Related Questions