Jackie
Jackie

Reputation: 11

how to reduce task kill period time when task state is TASK_LOST?

I am working around with marathon & mesos & docker very well, but it recently discovered a problem.when mesos-slave encounter an Exception , the state of task on Marathon will change to TASK_LOST , and the task can not be killed only after about 15mins.

I did a test by manually Reboot My Operation System that run mesos-slave service and docker and run the task, and then the task state shown in Marathon UI became to " Unscheduled(100%) " ,and the task can not be killed automatically either manually, until past about 15 minutes. My question is how to reduce this time? I tried to add marathon startup command line args with

task_launch_confirm_timeout=30000
scale_apps_interval = 30000
task_lost_expunge_initial_delay = 30000
task_launch_timeout = 30000

and add mesos-slave startup command line args with

recovery_timeout=1mins

but it doesn't work for me.

Upvotes: 0

Views: 397

Answers (1)

janisz
janisz

Reputation: 6371

To forcefully change the time after executor commit suicide if Mesos agent process failed you should configure --recovery_timeout

Amount of time allotted for the agent to recover. If the agent takes longer than recovery_timeout to recover, any executors that are waiting to reconnect to the agent will self-terminate. (default: 15mins)

Upvotes: 2

Related Questions