amkingTRP
amkingTRP

Reputation: 288

How to avoid race conditions in a distributed lock system using replicated redis clusters (or other replicated storage systems)?

We have identical services running on two Azure regional environments along with a redis system that is replicated/synchronised across the two regions. This is enterprise active-active replication.

Entries may be placed into one redis instance and will be replicated by Azure into the other region. We have a service in each region that frequently scans for these entries and when it finds an entry it will attempt to gain a distributed lock based on this entry.

The distributed lock is another redis entry created using the stackoverflow redis library:

StringSetAsync(key, value, expiry, When.NotExists, flags);

What we're finding is that the service in both regions is attempting to grab the distributed lock at roughly the same time (a few milliseconds difference) and sometimes the latency in replication means that each service obtains the lock in their region and the replication effectively "crosses over". This results in each service doing identical worker and the system produces duplicated output (which is a big problem).

In our situation replication on the redis cluster is required for a SLA. It looks like the other similar questions on StackOverflow don't involve replication.

There are various solutions we are going to investigate:

There are some more bespoke ideas we might look at, but these would be a last resort.

What we're interested in is what solution worked best for other people in this sort of situation (the requirement for replication) and whether people can suggest anything else we haven't considered yet.

Upvotes: 2

Views: 428

Answers (0)

Related Questions