Reputation: 652

Put retried sidekiq jobs at the beginning of queue

I have sidekiq queue with ~100000 jobs. Some of the jobs fail, which is okay, because they usually succeed when being retried by sidekiq.

However those jobs from the RetrySet are added at the end of our queue. A long time passes until the jobs are processed again.

How to put retried jobs at the beginning of the queue, so that they are processed with priority?

Upvotes: 3

Answers (2)

Boris B.

Reputation: 5024

If you want a failed job to be at the beginning of the queue when retried, then that means that you're fine with a wait time of 0 between retries.

In that case you should simply wrap the meat of the job execution code in a begin/rescue block and retry right then and there.

Be advised that you'll also need an "out", e.g. retry a fixed number of times or for a specific time interval, otherwise a poisoned message would cause your worker to process the same message indefinitely (like it happens with other message brokers that return rejected messages to the top of the queue, e.g. RabbitMQ)

Upvotes: 0

anothermh

Reputation: 10546

I don't believe there's a great answer for this because if I remember right Sidekiq queues use Redis Lists, so there's an expectation of FIFO. Retried jobs get queued in the same queue, so that means they'll always be at the end.

One approach, which isn't great and isn't what I'd recommend, is to add another queue and have job retries get sent to it instead:

# config/sidekiq.yml
---
:queues:
  - default
  - my_worker_retries

Set the worker not to retry:

class MyWorker
  include Sidekiq::Worker
  sidekiq_options retry: false
end

Make sure that your worker predictably raises an error, like the following:

class MyWorker
  include Sidekiq::Worker
  sidekiq_options retry: false

  def perform(arg)
    raise ArgumentError
  end
end

Add some logic to handle that exception and then run this job again through your newly created queue:

class MyWorker
  include Sidekiq::Worker
  sidekiq_options retry: false

  def perform(arg)
    begin
      raise ArgumentError
    rescue ArgumentError => error
      MyWorker.set(queue: :my_worker_retries).perform_async(arg)
    end
  end
end

This means that any job that fails and gets queued in the my_worker_retries queue may get stuck in an infinite loop -- the job fails, gets rescued, gets queued, fails again -- and worse, since you're not making use of Sidekiq's built in retry queuing mechanism, there's no back off algorithm to ensure retries don't fire as fast as your CPU can handle it.

The whole thing is just brittle.

You can try to prevent this by passing an argument indicating how many times this job has been retried so you can stop after some number:

class MyWorker
  include Sidekiq::Worker
  sidekiq_options retry: false

  MAX_RETRIES = 5

  def perform(arg, retries = 0)
    raise 'Too many retries' if retries >= MAX_RETRIES

    begin
      raise ArgumentError
    rescue ArgumentError => error
      MyWorker.set(queue: :my_worker_retries).perform_async(arg, retries + 1)
    end
  end
end

You could extend this to have a back off algorithm of your own:

MyWorker.set(queue: :my_worker_retries).perform_in((retries + 1).hours, arg, retries + 1)

None of this is ideal but it does answer the question. I sure hope there's a better solution than this.

There are some Sidekiq extensions that might work, for example https://github.com/chartmogul/sidekiq-priority_queue, but I haven't used them before.

Upvotes: 2

Put retried sidekiq jobs at the beginning of queue

Answers (2)

Related Questions