Reputation: 652
I have sidekiq queue with ~100000 jobs. Some of the jobs fail, which is okay, because they usually succeed when being retried by sidekiq.
However those jobs from the RetrySet are added at the end of our queue. A long time passes until the jobs are processed again.
How to put retried jobs at the beginning of the queue, so that they are processed with priority?
Upvotes: 3
Views: 2248
Reputation: 5024
If you want a failed job to be at the beginning of the queue when retried, then that means that you're fine with a wait time of 0
between retries.
In that case you should simply wrap the meat of the job execution code in a begin/rescue
block and retry right then and there.
Be advised that you'll also need an "out", e.g. retry a fixed number of times or for a specific time interval, otherwise a poisoned message would cause your worker to process the same message indefinitely (like it happens with other message brokers that return rejected messages to the top of the queue, e.g. RabbitMQ)
Upvotes: 0
Reputation: 10546
I don't believe there's a great answer for this because if I remember right Sidekiq queues use Redis Lists, so there's an expectation of FIFO. Retried jobs get queued in the same queue, so that means they'll always be at the end.
One approach, which isn't great and isn't what I'd recommend, is to add another queue and have job retries get sent to it instead:
# config/sidekiq.yml
---
:queues:
- default
- my_worker_retries
Set the worker not to retry:
class MyWorker
include Sidekiq::Worker
sidekiq_options retry: false
end
Make sure that your worker predictably raises an error, like the following:
class MyWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(arg)
raise ArgumentError
end
end
Add some logic to handle that exception and then run this job again through your newly created queue:
class MyWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(arg)
begin
raise ArgumentError
rescue ArgumentError => error
MyWorker.set(queue: :my_worker_retries).perform_async(arg)
end
end
end
This means that any job that fails and gets queued in the my_worker_retries
queue may get stuck in an infinite loop -- the job fails, gets rescued, gets queued, fails again -- and worse, since you're not making use of Sidekiq's built in retry queuing mechanism, there's no back off algorithm to ensure retries don't fire as fast as your CPU can handle it.
The whole thing is just brittle.
You can try to prevent this by passing an argument indicating how many times this job has been retried so you can stop after some number:
class MyWorker
include Sidekiq::Worker
sidekiq_options retry: false
MAX_RETRIES = 5
def perform(arg, retries = 0)
raise 'Too many retries' if retries >= MAX_RETRIES
begin
raise ArgumentError
rescue ArgumentError => error
MyWorker.set(queue: :my_worker_retries).perform_async(arg, retries + 1)
end
end
end
You could extend this to have a back off algorithm of your own:
MyWorker.set(queue: :my_worker_retries).perform_in((retries + 1).hours, arg, retries + 1)
None of this is ideal but it does answer the question. I sure hope there's a better solution than this.
There are some Sidekiq extensions that might work, for example https://github.com/chartmogul/sidekiq-priority_queue, but I haven't used them before.
Upvotes: 2