Reputation: 48682
When using replication with a quorum, Elasticsearch allows writes to fail for some (a small number of) replica shards. Writing to a replica might fail only because it is temporarily unavailable (because of a temporary network partition, for example). When that shard becomes available again (the network is fixed, for example), what happens?
Does Elasticsearch automatically detect that the shard is out of date (stale, inconsistent with the primary shard) and update it in the background? Or must you perform a manual operation? When the shard returns from being unavailable, but is out of date, does Elasticsearch automatically refrain from querying that shard (and retrieving stale data) until it is brought up to date? Or must you provide special query parameters it ensure that out-of-date shards are not used?
Upvotes: 1
Views: 1227
Reputation: 10859
Be careful: Quorum is generally associated with the election of the one master node out of all master eligible nodes. That master maintains the cluster state, which keeps track of the one primary shard (plus 0 or more replica shards) — there is no quorum involved for this.
The replication protocol has been improved a lot in 6.0 with sequence numbers and primary terms. A good overview is the blog post about it. Basically all operations are numbered (per shard), so missing operations can be detected and replayed; see the recovery part in the blog post in particular.
With failing primary shards it can get a little more interesting; one great post about more details is available on Elastic's discuss.
Upvotes: 1
Reputation: 691
Elasticsearch manages automatically the replica that it are out of date. No manual operation or special query are necessary.
In case of nodes/network failure you have to ensure that a quorum of the cluster remain online, otherwise you will encounter the split brain problem in which you cannot known which of the replica is in line and which is out of date.
Upvotes: 1