Some Name
Some Name

Reputation: 9521

Can Kafka be used as a distribute work queue

I'm considering Kafka to use as a distributed work queue multiple workers can retrieve tasks from. My original design looks as:

Work Producer ---> Kafka topic ------worker 1
                                  |
                                  |__worker 2
                                  ...
                                  |__worker n

The problems with this design is this:

  1. If some worker takes a task from the topic and immediately commits offset then in case of failure the task may not be reprocessed.

  2. If some worker takes a task from the topic and commits offset only on finish then other workers may also takes this task and process it. If the task is pretty long lasting then almost all workers will take the same task and process it completely inhibiting the distributing nature.


I'm looking for a way "mark" a task in a queue as "in progress" so it's not consumed by anyone else, but offset is not committed (because it may fail and needs reprocessing). Is it possible to implement?

Upvotes: 4

Views: 2738

Answers (1)

Michael Heil
Michael Heil

Reputation: 18475

If some worker takes a task from the topic and immediately commits offset then in case of failure the task may not be reprocessed.

In that case I recommend to use manual commits and disable the auto.commit.offset configuration of your consumer.

If some worker takes a task from the topic and commits offset only on finish then other workers may also takes this task and process it. If the task is pretty long lasting then almost all workers will take the same task and process it completely inhibiting the distributing nature.

You could deal with this scenario by designing your topic with partitions and your consumers with a ConsumerGroup. In Kafka, every partition can only be read by one consumer thread within a Consumer Group.

That means, as long as all your consumers (or "workers") belong to the same ConsumerGroup it will never be the case that two workers will start reading and processing the same message.

Upvotes: 3

Related Questions