Distributed lock - Two nodes believing to have a token after process pause

Question

In this article, Martin Kleppmann claims that using fencing tokens solves the issue of process pauses and uses the following diagram to demonstrate it:

Here we can see that the write of client 1 gets rejected by the Storage service because it has seen another token with a higher value.

However, to me it looks like a race condition could still happen if Client 1 wakes up a bit earlier and sends its write request with token 33 before before Client 2 with token 34. We could arrive in a situation where:

Storage receives write request with token 33, updates last_see_token to 33.
Storage node starts writing the value associated with token 33.
While still writing the value provided by client 1, the Storage node receives the write request from Client 2 with token 34. Since token 34>33, the write is also accepted and we end up with two concurrent writes.

What am I missing? It looks like the Storage node needs its own lock to ensure that writing a value and checking/updating the last_seen_token are one atomic operation. But if we're doing that, it seems to defeat the purpose of having a lock service in the first place.

Distributed lock - Two nodes believing to have a token after process pause

Answers (0)

Related Questions