tlt
tlt

Reputation: 15191

Event Hub -- how to prevent duplicate handling when consumers scale out

When we have multiple consumers of Event Hub (or any messaging service, for that matter), how to make sure that no message is processed twice especially in a situation when consumer auto-scales out to multiple instances?

I know we could keep track of last message processed but then again, between the check if message was processed and actuall, processing it,other instance could process it already (race condition?.

so, how to solve that in a scalable way?

[UPDATE] i am aware there is a recommendation to have at least as many partitions as there are consumers but what to do in case when a single consumer cannot process messages directed to it but needs to scale out to multiple instances?

Upvotes: 1

Views: 4836

Answers (1)

Peter Bons
Peter Bons

Reputation: 29711

Each processor takes a lease on a partition, see the docs

An event processor instance typically owns and processes events from one or more partitions. Ownership of partitions is evenly distributed among all the active event processor instances associated with an event hub and consumer group combination.

So scaling out doesn't result in duplicate message processing because a new processor cannot take a lease on a partition that is already being handled by another processor.

Then, regarding your comment:

i am aware there is a recommendation to have at least as many partitions as there are consumers

It is the other way around: it is recommended to have as many consumers as you have partitions. If you have more consumers than partitions the consumers will compete with each other to obtain a lock on a partition.

Now, regarding duplicate messages, since Event Hub guarantees at-least-once delivery there isn't much you can do to prevent this. There aren't that many scalable services that offer at-most-once deliveries, I know that Azure Service Bus Queues do offer this if you really need it.

The question may arise what can cause duplicate message processing. Well, when processing message the processor does some checkpointing: once in a while it will store its position within a partition event sequence (remember, a partition is bound to a single processor). Now when the processer instance crashes between two checkpoint events a new instance will resume processing messages from the position of the last checkpoint. That may very well lead to older messages being processed again.

If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group.

So, that means you need to make sure your processing logic is idempotent. How, that is up to you as I don't know your use case.

One option is to track each individual message to see whether it is already processed or not. If you do not have a unique ID to check on maybe you can generate a hash of the whole message and compare with that.

Upvotes: 5

Related Questions