Reputation: 175
Background: currently I am investigating the mechanism of distributed lock in order to implement it in our email service to deduplicate in the time the QPS is high.
The main article I am reading is this one https://redis.io/docs/reference/patterns/distributed-locks/#retry-on-failure from redis official.
In this article it mentioned in the retry section that
When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins)
Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.
The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically make split brain conditions during resource contention unlikely.
I read a few recommended open source implementation of redlock from git, indeed there is delay between retry, copy and pasted here for your convenience.
func (m *Mutex) LockContext(ctx context.Context) error {
if ctx == nil {
ctx = context.Background()
}
value, err := m.genValueFunc()
if err != nil {
return err
}
for i := 0; i < m.tries; i++ {
if i != 0 {
select {
case <-ctx.Done():
// Exit early if the context is done.
return ErrFailed
case <-time.After(m.delayFunc(i)):
// Fall-through when the delay timer completes.
}
}
start := time.Now()
n, err := func() (int, error) {
ctx, cancel := context.WithTimeout(ctx, time.Duration(int64(float64(m.expiry)*m.timeoutFactor)))
defer cancel()
return m.actOnPoolsAsync(func(pool redis.Pool) (bool, error) {
return m.acquire(ctx, pool, value)
})
}()
if n == 0 && err != nil {
return err
}
now := time.Now()
until := now.Add(m.expiry - now.Sub(start) - time.Duration(int64(float64(m.expiry)*m.driftFactor)))
if n >= m.quorum && now.Before(until) {
m.value = value
m.until = until
return nil
}
_, err = func() (int, error) {
ctx, cancel := context.WithTimeout(ctx, time.Duration(int64(float64(m.expiry)*m.timeoutFactor)))
defer cancel()
return m.actOnPoolsAsync(func(pool redis.Pool) (bool, error) {
return m.release(ctx, pool, value)
})
}()
if i == m.tries-1 && err != nil {
return err
}
}
return ErrFailed
}
What I do not understand is that why the existence of the retry can help to avoid brain split?
Intuitively the backoff time(delay) for the retry is used to avoid overloading the redis instance.
Upvotes: 2
Views: 1293