QIU YULONG
QIU YULONG

Reputation: 101

Primary shard unassigned after adding more nodes (or node lost)

Some indices primary shard is in unassigned status, it shows a red indicator. If I run GET _cluster/allocation/explain, it will give the details like:

"unassigned_info": {
    "reason": "ALLOCATION_FAILED",
    "at": "2018-06-19T00:29:17.781Z",
    "failed_attempts": 5,
    "delayed": false,
    "details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[log_2018_05][10]: obtaining shard lock timed out after 5000ms]; ",
    "allocation_status": "deciders_no"
}

This happened twice in our production environment. Once was when multiple data nodes were lost in one network disconnection. Another time was when we added 2 more data nodes into the environment.

How should this be handled?

Upvotes: 1

Views: 814

Answers (1)

QIU YULONG
QIU YULONG

Reputation: 101

The reason is that the primary shard is in unassigned status.

Elastic's support suggested the below API call, which will retry the shard allocation that has failed before:

 POST _cluster/reroute?retry_failed

That should make the unassigned primary shards reroute to available nodes, and the cluster should turn from red to yellow status (which means all primary shards are ok, replica are not ready)

Upvotes: 1

Related Questions