improving elasticsearch replica creation performance

Question

I have an elasticsearch setup that had until now one machine (16core 64GB RAM 4x800GB SSD, containing 1.5TB of log data in 450 indexes, running ES 5.1.

Now I added a second, identical server to the cluster. Both are connected via a 10GBit network.

All indexes have 1 shard, and I have configured them to have 1 replica after the second server went online.

Now, replicas are being created, but only slowly. Load on both machines is below 1 and IO rates are at about 2MB/s or less.

I am running the following settings:

{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "node_concurrent_incoming_recoveries": "20",
          "node_initial_primaries_recoveries": "8",
          "node_concurrent_outgoing_recoveries": "20"
        }
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "400mb"
      },
      "store": {
        "throttle": {
          "type": "none"
        }
      }
    }
  },
  "transient": {
    "logger": {
      "org": {
        "elasticsearch": {
          "indices": "DEBUG"
        }
      }
    }
  }
}

indices.store.throttle.type does not seem to exist anymore in ES 5.

At the current rate, transferring all data will take multiple weeks.

Andreas Tschritter · Accepted Answer

I just found the issue here - the nodes announced the IPs of the wrong network cards and all the data was routed via a slow link.

I should have thought of iftop earlier.

After changing the announced IPs, replication worked with > 800MB/s.

improving elasticsearch replica creation performance

Answers (2)

Related Questions