Reputation: 1
Currently I have a very busy server and the Flink is fighting with other applications for resources, this causes the Taskmanager to crash and all the jobs to go into a Fail state. I usually just restart the Taskmanager.sh and it starts working until it crashes again.
I know that the solution is to get more resources and that's coming, but until then, I need help to understand the best way to automate the Taskmanager.sh to restart automatically when it crashes.
I've searched online, and have found two options 1.- Create a script that then gets added to a cronjob. 2.- Copy the Taskmanager.sh to the systemd and edit the file to restart it when it senses it's down (https://www.cyberciti.biz/faq/how-to-restart-a-process-out-of-crontab-on-a-linuxunix/)
I have not done scripts before, so before I tried to delve into it, wanted to know if any of these options are doable with the Flink's Taskmanager, or if there's an easier solution?
System: RHEL
Thanks!
Manually restarting the Taskmanager.sh works but crashes not even 5mins later.
Upvotes: 0
Views: 106
Reputation: 43717
The standard approach to this is to deploy Flink with Kubernetes or Yarn, in which case this is handled automatically. But setting up the taskmanager as a service managed by systemd should be workable.
Upvotes: 0