Reputation: 1704
Azure Event Hubs released a modern client library (Azure.Messaging.EventHubs
) for reading and writing to event hubs. The new library is supposed to replace the old one (Microsoft.Azure.EventHubs
), so I wonder what should be the upgrade path for existing applications currently using the old library.
More specifically, does switching to the new library mean that the application must lose the checkpoints from the old version? While the migration guide provides a clear explanation of the upgrade benefits, as well as code examples, I couldn't find any mention of data loss.
Upvotes: 3
Views: 1047
Reputation: 7860
Your observations are correct. It was an unfortunate, though intentional, decision to not support legacy checkpoint data in the new client. In order for us to meet the goals set for unifying checkpoint data across languages for the new set of Event Hubs libraries and to make improvements to the algorithm used for managing partition ownership, breaking changes were necessary.
You're absolutely correct in that we should highlight this in the migration guide, offer guidance around how to migrate checkpoint data, and, ideally, offer a utility to assist with that migration. That's an oversight that was recently brought to our attention, and we're tracking some work to fix that. (see: #11373, #11374)
The best work-around in the short term would be to handle the EventProcessorClient.PartitionInitializingAsync
event and set the PartitionInitializingEventArgs.DefaultStartingPosition
value in the arguments received to the position that was recorded in the legacy checkpoint. This would be necessary only for the first run, until a new checkpoint was recorded using the EventProcessorClient
but would require reading and parsing the legacy checkpoint blob to determine the position. This sample illustrates the approach.
Upvotes: 2
Reputation: 1704
It turns out the new SDK doesn't use the same format of checkpoint files as the old version, and transitioning to the new library means that the checkpoints of the old version will not be respected. The new version will start reading from the beginning of the Event Hubs (according to the retention time specified).
Both versions of library use Azure blob storage for handling checkpoints and leases.
However, while the old library used a single file per partition, that contained the checkpoint and the owner information in a JSON format. For example, the file named 0
, for partition 0, had the following content:
{"Offset":"0","SequenceNumber":0,"PartitionId":"0","Owner":"host-x","Token":<guid>","Epoch":62}
The new library uses two files for each partition, each file in a separate folder. The folders are named ownership
and checkpoint
, and they contain files per partition id. Files in the ownership folder have no content, and the owner ID is stated in the metadata of the blob. Similarly, files in the checkpoint folder have no content, and the progress data is stored in the metadata, in two different fields: offset
and sequencenumber
.
In addition, the new library has a more complex folder structure: /EventHubsNamespace/EventHubsName/ConsumerGroupName/ instead of the old /ConsumerGroupName/ structure in the old library.
It is possible that one could write a script to migrate the checkpoints file to the new format, as it seems like all the needed information is available, but I haven't tested that.
Upvotes: 5