Reputation: 11
I have an elasticsearch setup that had until now one machine (16core 64GB RAM 4x800GB SSD, containing 1.5TB of log data in 450 indexes, running ES 5.1.
Now I added a second, identical server to the cluster. Both are connected via a 10GBit network.
All indexes have 1 shard, and I have configured them to have 1 replica after the second server went online.
Now, replicas are being created, but only slowly. Load on both machines is below 1 and IO rates are at about 2MB/s or less.
I am running the following settings:
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"node_concurrent_incoming_recoveries": "20",
"node_initial_primaries_recoveries": "8",
"node_concurrent_outgoing_recoveries": "20"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "400mb"
},
"store": {
"throttle": {
"type": "none"
}
}
}
},
"transient": {
"logger": {
"org": {
"elasticsearch": {
"indices": "DEBUG"
}
}
}
}
}
indices.store.throttle.type does not seem to exist anymore in ES 5.
At the current rate, transferring all data will take multiple weeks.
Upvotes: 0
Views: 703
Reputation: 11
I just found the issue here - the nodes announced the IPs of the wrong network cards and all the data was routed via a slow link.
I should have thought of iftop earlier.
After changing the announced IPs, replication worked with > 800MB/s.
Upvotes: 1
Reputation: 2382
When I need to speed up reassignments of shards after servers reboot I am setting temporarily settings to the following, and the assignment gets much faster, it might help.
curl -XPUT 'eshostname:9200/_cluster/settings' -d '{
"transient": {
"cluster.routing.allocation.allow_rebalance" : "indices_all_active",
"cluster.routing.allocation.node_concurrent_recoveries": 160,
"cluster.routing.allocation.node_initial_primaries_recoveries" : 100,
"cluster.routing.allocation.enable": "all",
"indices.recovery.max_bytes_per_sec": "400mb",
"indices.recovery.concurrent_streams" : 30,
"indices.recovery.concurrent_small_file_streams" : 30
}
}'
Upvotes: 0