Reputation: 338
I'm in the process of designing an event driven architecture based with Azure Event Grid. Part of the design is high availability. Looking at the documentation, I read there is server-side geo disaster recovery, where out of the box the metadata about topics and subscriptions is replicated to the pair region. When the primary region goes down, the pair region takes over and the events keep flowing.
So it seems the impact on event subscribers is minimal. Although I can see an issue where Event Grid needs to fail over, probably means there is a serious issue in that region impacting subscribers. To mitigate the risk of subscribers (Functions) being unavailable in a certain region, I'm planning to have webhooks, APIM and a load balancer to forward events to a different region in case of a DR event.
How about the impact on the event publishers? The URL to publish events to contains a region, so does that URL change when Event Grid fails over to the pair region?
I'm thinking about making the topics available via APIM and point to the pair region when the primary region is unhealty, but we already have a secondary region on the other side of the ocean. So in that case it makes more sense to duplicate the topics to that other region and let APIM handle forwarding to one of the available regions. In this scenario we end up with two active (load balanced) topics.
Although I typically like to use out of the box functionality as much as possible, I think in this case a custom DR solution would be appropriate.
So besides the impact on event publishers with OOTB DR, I'm looking for best practices for DR and Event Grid. Is my approach feasible?
Upvotes: 1
Views: 555
Reputation: 11
Regarding "How about the impact on the event publishers?
The URL to publish events to contains a region, so does that URL change when Event Grid fails over to the pair region?", the URL used to publish events is not affected. There is a remapping of what URL refers to during a failover.
Regarding "I'm thinking about making the topics available via APIM...
In this scenario we end up with two active (load balanced) topics.", that is an option. We also have documented a client active-passive approach to failover at LINK, but it seems that your APIM approach would handle the forwarding for you.
As to the question if your approach is feasible, it looks a sensible approach to me. Please let us know if you have any feedback.
Upvotes: 1