DSA
DSA

Reputation: 780

How to handle data processing failures in Azure Event Hub?

We are planning to utilize event hub for IoT Device data streaming and processing. We are ready with architecture but the only challenge we are seeing is failure re-processing. Below is the example,

  1. IoT devices sends (passing through IoT Hub) data to Event Hub. Event Hub reader is pushing data to the database.
  2. If somehow our processor gets fail (not for all the telemetry but for faulted data), we want to capture those events separately.
  3. Event Hub comes with ability to set checkpoint but that would marked across the event hub events, not for the specific event.
  4. We only want to log events which fails during processing and want to implement re-processing logic for such event.

Any thought on the same?

Upvotes: 1

Views: 2023

Answers (1)

Jesse Squire
Jesse Squire

Reputation: 7920

Event Hubs is meant to be read as a forward-only stream, where once an application reads an event, it has handled that event in the way that is appropriate to the application context.

Because Event Hubs prioritizes high throughput, the service intentionally does not provide a rich set of broker-side features, delegating more responsibility to the consuming application. Unfortunately, this includes support for dead-lettering or marking a random set of events.

As Peter mentioned, it may be that Service Bus is a better fit for your scenario and that copying events from Event Hubs and into Service Bus for processing would give you built-in dead-lettering support as well as other features that would simplify your application logic. This article provides a good comparison of the Azure Messaging offerings, in case you want to consider.

If you're set on using Event Hubs, Peter's suggestion of moving the poison/faulted event to another storage platform (message queue, database, et al.) is the recommended pattern. This will let you revisit them as a set and remove them once you've dealt with them.

An alternative would be to just log the partition and offset of the event and then use the EventHubConsumerClient or PartitionReceiver to read that back at a later time, but that's going to be an inefficient pattern and would require a lot of temporary objects and network overhead.

Upvotes: 2

Related Questions