Andreas Tschritter
Andreas Tschritter

Reputation: 11

improving elasticsearch replica creation performance

I have an elasticsearch setup that had until now one machine (16core 64GB RAM 4x800GB SSD, containing 1.5TB of log data in 450 indexes, running ES 5.1.

Now I added a second, identical server to the cluster. Both are connected via a 10GBit network.

All indexes have 1 shard, and I have configured them to have 1 replica after the second server went online.

Now, replicas are being created, but only slowly. Load on both machines is below 1 and IO rates are at about 2MB/s or less.

I am running the following settings:

{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "node_concurrent_incoming_recoveries": "20",
          "node_initial_primaries_recoveries": "8",
          "node_concurrent_outgoing_recoveries": "20"
        }
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "400mb"
      },
      "store": {
        "throttle": {
          "type": "none"
        }
      }
    }
  },
  "transient": {
    "logger": {
      "org": {
        "elasticsearch": {
          "indices": "DEBUG"
        }
      }
    }
  }
}

indices.store.throttle.type does not seem to exist anymore in ES 5.

At the current rate, transferring all data will take multiple weeks.

Upvotes: 0

Views: 703

Answers (2)

Andreas Tschritter
Andreas Tschritter

Reputation: 11

I just found the issue here - the nodes announced the IPs of the wrong network cards and all the data was routed via a slow link.

I should have thought of iftop earlier.

After changing the announced IPs, replication worked with > 800MB/s.

Upvotes: 1

Tomasz Swider
Tomasz Swider

Reputation: 2382

When I need to speed up reassignments of shards after servers reboot I am setting temporarily settings to the following, and the assignment gets much faster, it might help.

    curl -XPUT 'eshostname:9200/_cluster/settings' -d '{
                "transient": {
                        "cluster.routing.allocation.allow_rebalance" : "indices_all_active",
                        "cluster.routing.allocation.node_concurrent_recoveries": 160,
                        "cluster.routing.allocation.node_initial_primaries_recoveries" : 100,
                        "cluster.routing.allocation.enable": "all",
                        "indices.recovery.max_bytes_per_sec": "400mb",
                        "indices.recovery.concurrent_streams" : 30,
                        "indices.recovery.concurrent_small_file_streams" : 30
                }
        }'

Upvotes: 0

Related Questions