Reputation: 13414
I have an application that uses resque to run some long-running jobs. Sometimes the take 8 hours or more to complete.
In situations where the job fails, is there a way to monitor resque itself to see if the job is running? I know I can update the job's status in a database table (or in redis itself), but I want to know if the job is still running so I can kill it if necessary.
The specific things I need to do are:
Upvotes: 2
Views: 3541
Reputation: 1529
The god solution ends up killing off workers that possibly aren't stuck or bad at all. I started working on addressing this issue as well via a different approach. You do whatever you want - register a handler (can kill, email, send a pager alert, etc) when any resque problems come up.
If a job doesn't get processed during a certain timeframe (either because resque is stuck, the queue has an insane backlog, or resque just isn't running at all), the handler will get invoked. Feel free to poke at the README for more details.
https://github.com/shaiguitar/resque_stuck_queue#readme
Upvotes: 1
Reputation: 230521
Resque github repository has this secret gem, a god task that will do exactly this: watch your tasks and kill stale ones.
https://github.com/resque/resque/blob/master/examples/god/stale.god
# This will ride alongside god and kill any rogue stale worker
# processes. Their sacrifice is for the greater good.
WORKER_TIMEOUT = 60 * 10 # 10 minutes
Thread.new do
loop do
begin
`ps -e -o pid,command | grep [r]esque`.split("\n").each do |line|
parts = line.split(' ')
next if parts[-2] != "at"
started = parts[-1].to_i
elapsed = Time.now - Time.at(started)
if elapsed >= WORKER_TIMEOUT
::Process.kill('USR1', parts[0].to_i)
end
end
rescue
# don't die because of stupid exceptions
nil
end
sleep 30
end
end
Upvotes: 3