phil
phil

Reputation: 4908

ElasticSearch won't replicate to new node

We've been running a production system with a single node for over a year and decided to bump up to having 2 nodes for some resiliency.

I can telnet between machines without issue.

I can issue curl commands between machines.

Have upgraded production to 7.7.1 oss from 6.8 default. Have created brand new node also running 7.7.1 oss

master:

 curl -XGET 'http://localhost:9200/?pretty'
{
  "name" : "elasticsearch-01",
  "cluster_name" : "zm-amz-data",
  "cluster_uuid" : "EzB5di4pQzm7whY4fkpkbQ",
  "version" : {
    "number" : "7.7.1",
    "build_flavor" : "oss",
    "build_type" : "deb",
    "build_hash" : "ad56dce891c901a492bb1ee393f12dfff473a423",
    "build_date" : "2020-05-28T16:30:01.040088Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

New Node:

curl -XGET 'http://localhost:9200/?pretty'
{
  "name" : "elasticsearch-02",
  "cluster_name" : "zm-amz-data",
  "cluster_uuid" : "EzB5di4pQzm7whY4fkpkbQ",
  "version" : {
    "number" : "7.7.1",
    "build_flavor" : "oss",
    "build_type" : "deb",
    "build_hash" : "ad56dce891c901a492bb1ee393f12dfff473a423",
    "build_date" : "2020-05-28T16:30:01.040088Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Still no luck. After 3 days I'm getting close to throwing ES out. The new node sees the 'master'. It joins the cluster. Now the issue is that no data is replicating. Cluster status is red.

curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{
  "cluster_name" : "zm-amz-data",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 29,
  "active_shards" : 29,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 39,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 42.64705882352941
}

There are no errors on the new node or the master. This is end of the master log.

[2020-06-07T00:07:00,644][INFO ][o.e.c.s.MasterService    ] [elasticsearch-01] node-join[{elasticsearch-02}{e_toEmodToGU98qY6MZaWQ}{kiBIItOtRpql0GcThMtkHg}{<new node ip>}{<new node ip>:9300}{dimr} join existing leader], term: 3, version: 154, delta: added {{elasticsearch-02}{e_toEmodToGU98qY6MZaWQ}{kiBIItOtRpql0GcThMtkHg}{<new node ip>}{<new node ip>:9300}{dimr}}
[2020-06-07T00:07:01,131][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-01] added {{elasticsearch-02}{e_toEmodToGU98qY6MZaWQ}{kiBIItOtRpql0GcThMtkHg}{<new node ip>}{<new node ip>:9300}{dimr}}, term: 3, version: 154, reason: Publication{term=3, version=154}

All I want is to get all our data from the master onto the new node!

As requested:

curl -XGET 'localhost:9200/_cat/nodes?pretty'
<master node IP> 25 25 0 0.29 0.33 0.24 dimr * elasticsearch-01
<new node IP> 24 31 0 0.01 0.00 0.00 dimr - elasticsearch-02

curl -XGET 'localhost:9200/_cat/allocation?pretty'
29 86.1gb 276.7gb 963.5gb   1.2tb 22 <master node IP> <master node IP> elasticsearch-01
 0     0b   3.7gb 151.1gb 154.8gb  2 <new node IP> <new node IP> elasticsearch-02
39                                                               UNASSIGNED

curl -XGET 'localhost:9200/_cluster/settings?pretty'
{
  "persistent" : {
    "archived" : {
      "xpack" : {
        "monitoring" : {
          "collection" : {
            "enabled" : "true"
          }
        }
      }
    },
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "primaries"
        }
      }
    }
  },
  "transient" : { }
}

curl -XGET 'localhost:9200/_cat/indices?pretty'
yellow open app_indices_one                X73QG8FeR3qbfwvUTgZM4w 5 1 141039497 6841171  85.4gb  85.4gb
yellow open .monitoring-kibana-6-2020.05.31 weQn3afBQ3yQ_gWbB1ZeBA 1 1      8639       0   1.9mb   1.9mb
red    open .apm-custom-link                Cm4oM-fJRs6o8RH275pshQ 1 1                                  
yellow open .monitoring-kibana-6-2020.06.04 dEsNwfodSQCy4a5FsfVzbQ 1 1      8610       0   1.9mb   1.9mb
red    open .kibana_task_manager_2          IsjOQqoWTxSytilnTtAHLw 1 1                                  
yellow open .monitoring-es-6-2020.06.05     vjZgiH6wTmS8nM9uhZ2Z6g 1 1    208537     507 111.8mb 111.8mb
yellow open .monitoring-es-6-2020.06.06     qV2J_qtnQoGFg6C8R-mIOA 1 1     32582     273  18.4mb  18.4mb
yellow open .monitoring-kibana-6-2020.06.03 qQWZQ8XoRxS9rhNDbN-THQ 1 1      8577       0   2.1mb   2.1mb
red    open .kibana_task_manager_1          iAwrQJnKSm2N_VU1Q-LgQA 1 1                                  
yellow open .monitoring-kibana-6-2020.06.02 xWal0unTS2qsYxPnOmmhPw 1 1      8639       0   1.9mb   1.9mb
yellow open .monitoring-es-6-2020.06.03     EpT1Ex5eQiKwQNXgmcdpCQ 1 1    206852     312 111.2mb 111.2mb
yellow open .monitoring-es-6-2020.06.04     ASjXiuhkTwuUiuZ0sXQcjw 1 1    215820     320   111mb   111mb
yellow open .monitoring-kibana-6-2020.06.01 5sXzuGv0RSaCgQqXbWBsqA 1 1      8640       0   1.9mb   1.9mb
yellow open .monitoring-es-6-2020.06.01     tKb2GWnURki-guyW4Ssmfw 1 1    190529     222 108.4mb 108.4mb
yellow open .monitoring-es-6-2020.06.02     jl_eM6k5QVCtfikiL2K40A 1 1    199123     380   109mb   109mb
yellow open .monitoring-es-6-2020.05.31     odZ0ENHVT9mhXb4IXRHK7A 1 1    181885     324 103.6mb 103.6mb
yellow open .monitoring-kibana-6-2020.06.06 1ntSQo46TQa_dxG3otXNYQ 1 1      1232       0 427.6kb 427.6kb
yellow open .monitoring-kibana-6-2020.06.05 1rP2S6Z5S0GAbOsKSdupzg 1 1      8639       0   1.9mb   1.9mb
yellow open .kibana_task_manager            oB1wUSZXRi-lYcoDr1ifLg 1 1         2       0   6.9kb   6.9kb
yellow open app_indices_two                         gZ8MZoyITHWIAxFAHmcskQ 5 1      9881     174  32.2mb  32.2mb
red    open .apm-agent-configuration        tVurV4DbTaiLrR5Rh_sgeA 1 1                                  
yellow open .kibana_2                       itghNKqNR9uooWt-KlykDg 1 1        76       2  83.4kb  83.4kb
yellow open .kibana_1                       6dPwb_2gSmiSvPXfw3FMJg 1 1        12       1    43kb    43kb
yellow open kibana_sample_data_ecommerce    3KoZDmrMRhGnklR3Tom_xA 1 1      4675       0   4.7mb   4.7mb
red    open filebeat-7.7.1-2020.06.06       WqlfkqXzQ0SMZXej7Va-qA 1 1                                  
yellow open app_indices_three                          0WrMg_5DQm6zbuHS14qwEA 1 1      2169       0   1.1mb   1.1mb

Update and Solution?

The issue was the _xpack settings that could be seen in the cluster settings call. This was a holdover from when we were on 6.8 default. XPack is not available on OSS.

Someone on the Elastic Search forum gave me the answer, which I will enter as an answer on this question.

Upvotes: 0

Views: 484

Answers (2)

phil
phil

Reputation: 4908

The issue was the _xpack settings that could be seen in the cluster settings call. This was a holdover from when we were on 6.8 default. XPack is not available on OSS.

Someone on the ElasticSearch forum gave me the way to remove that setting:

To remove XPack from an elasticsearch-oss installation if you are upgrading from elasticsearch-default

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "archived.*" : null
  }
}
'

No restart was needed for me. Instantly my nodes started synchronizing!

Then the following can be executed to make sure all allocations sync:

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}
'

Upvotes: 1

hamid bayat
hamid bayat

Reputation: 2179

you should change cluster.routing.allocation.enable to "all" or null

curl -H'content-type: application/json' -XPUT '[master-ip]:9200/_cluster/settings' -d '
{
"persistent":{ "cluster.routing.allocation.enable" : "all"}
}'

also you have 5 red indices and it seems that you have another issue. first of all check which primary shards of these indices are unassigned (using /_cat/shards)

then use explain API to find the problem.

 curl -H'content-type: application/json' -XGET '[master-ip]:9200/_cluster/allocation/explain?pretty' -d '
    {
    "index": "filebeat-7.7.1-2020.06.06",
    "shard":[unassigned shard number],
    "primary" : true
    }'

Upvotes: 0

Related Questions