As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds. I'm not sure why they won't clear or how to manually remove them. I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.

ruby-on-railsruby-on-rails-3herokuredisresque

Reputation: 25368

How do I clear stuck/stale Resque workers?

As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.

enter image description here

I'm not sure why they won't clear or how to manually remove them.

I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.

Upvotes: 143

Answers (16)

jrochkind

Reputation: 23347

In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:

Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }

I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.

This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.

If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.

Upvotes: 1

Sandip Subedi

Reputation: 1077

If you use Docker, you can also use this command:

<id> is the worker id.

docker stop <id>

docker start <id>

Upvotes: 0

Will Bryant

Reputation: 343

I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.

Upvotes: 2

Joakim Kolsjö

Reputation: 106

This avoids the problem as long as you have a resque version newer than 1.26.0:

resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work

Keep in mind that it does not let the currently running job finish.

Upvotes: 0

lloydpick

Reputation: 1649

If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...

Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}

Upvotes: 0

Rich Sutton

Reputation: 10784

Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.

Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }

Upvotes: 2

ewH

Reputation: 2683

Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).

Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}

I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350

Upvotes: 27

Shai

Reputation: 1529

Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:

"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."

Been used in production and works pretty well for me thus far.

Upvotes: 2

Andrei R

Reputation: 5168

I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku. Get dead worker ID from the list. Mine was

55ba6f3b-9287-4f81-987a-4e8ae7f51210:2

Run this command in redis directly.

del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"

You can monitor redis db to see what it's doing behind the scenes.

redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"

Second last line deletes the worker.

Upvotes: 1

user2811637

Reputation: 71

I just did:

% rails c production
irb(main):001:0>Resque.workers

Got the list of workers.

irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)

... where n is the zero based index of the unwanted worker.

Upvotes: 7

Shairon Toledo

Reputation: 2124

You probably have the resque gem installed, so you can open the console and get current workers

Resque.workers

It returns a list of workers

#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]

pick the worker and prune_dead_workers, for example the first one

Resque.workers.first.prune_dead_workers

Upvotes: 30

jobwat

Reputation: 9263

I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.

I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going

have a look at my script here: https://gist.github.com/jobwat/5712437

Upvotes: 0

Simpleton

Reputation: 6415

In your console:

queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"

Otherwise you can try to fake them as being done to remove them, with:

Resque::Worker.working.each {|w| w.done_working}

EDIT

A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.

Upvotes: 54

joost

Reputation: 6669

I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.

Fix this using:

Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers

Make sure you restart Redis and your Resque workers.

Upvotes: 2

hagope

Reputation: 5531

None of these solutions worked for me, I would still see this in redis-web:

0 out of 10 Workers Working

Finally, this worked for me to clear all the workers:

Resque.workers.each {|w| w.unregister_worker}

Upvotes: 226

jBeas

Reputation: 926

Run this command wherever you ran the command to start the server

$ ps -e -o pid,command | grep [r]esque

you should see something like this:

92102 resque: Processing ProcessNumbers since 1253142769

Make note of the PID (process id) in my example it is 92102

Then you can quit the process 1 of 2 ways.

Gracefully use QUIT 92102
Forcefully use TERM 92102

* I'm not sure of the syntax it's either QUIT 92102 or QUIT -92102

Let me know if you have any trouble.

Upvotes: 10

How do I clear stuck/stale Resque workers?

Answers (16)

Related Questions