How to fix red status on opensearch cluster?

Question

We have an opensearch cluster and noticed that the cluster was down. Had the AWS support help me in recovering the cluster but although the cluster is active now, I still see that the cluster is in RED status because one of the shard is unassigned.

Looks like the shard was unassigned during the outage we had with the cluster. I'm not sure how to recover to back green status.

Any suggestion on how to fix this?

Should I delete this shard? would that fix it? I tried reassigning but looks like it does not work since the shard copy is missing. Our backups were also affected when the cluster was down.

GET _cluster/health?pretty

{
  "cluster_name" : "xxxx-xxx-xxx",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "active_primary_shards" : 150,
  "active_shards" : 300,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 4,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 98.68421052631578
}

GET _cluster/allocation/explain?pretty

{
  "index" : ".opendistro-alerting-alerts",
  "shard" : 4,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2022-01-11T13:14:16.096Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions" : [ {
    "node_id" : "xxxxx",
    "node_name" : "sssssssssssssssssssss",
    "node_decision" : "no",
    "store" : {
      "found" : false
    }
  }

How to fix red status on opensearch cluster?

Answers (1)

Related Questions