How does client handle failures in RAFT-replicated datastores?

Consider a database like cockroachDB that uses RAFT protocol for replicating data to a replica group owning a partition of the data. How does a client handle a request that fails in such DBs? Because, in raft, an uncommitted entry could later get re-applied in case of a failover which promoted a node having that uncommitted entry in its log.

Upvotes: 1

Answers (4)

FranckPachot

Reputation: 597

Let me try to understand your question in detail.

Success Scenario on RF3:

Phase 1 (Synchronous):
- A write operation is sent to the Raft leader.
- The Raft leader writes the operation to its Write-Ahead Log (WAL) and sends it parallel to two followers.
- One follower acknowledges the receipt of the write.
- The leader applies the WAL to its datastore and confirms that the write was successful.
Phase 2 (Asynchronous):
- Later (during the next operation or heartbeat), the leader commits the write operation to the followers, who then update their datastores.

If the leader dies before completing the second phase, the two followers will elect a new leader. One follower knows that it acknowledged the uncommitted log from the old leader. With this follower plus the old leader, it reaches the quorum. The log was effectively committed.

Failure Scenario

However, if the follower's acknowledgment did not reach the leader before the timeout, the leader might report the write operation as unsuccessful. This scenario leads to a situation where an error was reported, but the write was applied (two peers have it in their log).

As a result, the client may attempt to retry the request with the new leader due to the error. As I mentioned above, this can cause an ERROR: Duplicate request message to occur.

Suppose it doesn't retry (say all nodes crashed simultaneously). In that case, we don't know if the failure happened before or after committing the write operation to the database. As mentioned by Dorian, this occurs with any database that gets an error for a commit operation and it is unrelated to Raft.

Upvotes: 0

Ryan Luu

Reputation: 21

For CockroachDB, a certain error code (40001) is returned, which signals to the caller that there is a retryable transaction error. The client is then expected to retry, just like some answers above indicate.

Docs here: https://www.cockroachlabs.com/docs/stable/common-errors#restart-transaction

Upvotes: 1

FranckPachot

Reputation: 597

The client talks to the Raft leader. If it detects an error it may retry, but it keeps an id of the request and the case you described may receive an error like this one in YugabyteDB: ERROR: Duplicate request 126674 from client 08b09407-d78e-4c10-982f-be446d9a4e87 (min running 126674)

When using the PostgreSQL compatible API (YSQL) of YugabyteDB, the Raft client is the PostgreSQL backend. If the database client (the application) encounters an error when it runs a COMMIT operation, it may need to retry the transaction. In this scenario, as there is no ID, the application must ensure that the transaction is idempotent if it retries. In SQL, integrity constraints may provide protection against such situations, but it's important to consider additional checks that may be necessary for automatic retries.

Upvotes: 1

dh YB

Reputation: 1097

In the case of YugabyteDB, the client doesn't know about internal RAFT replication details. It can trust the db with success or error just like with a single node database with no replication. It will be considered a bug on the db otherwise.

For more details, please describe in detail the exact scenario you had in mind.

Upvotes: 1

How does client handle failures in RAFT-replicated datastores?

Answers (4)

Related Questions