Reputation: 33
I've read a number of articles / forums on the placing of indexes/shards but have not yet found a solution to my requirement.
Fundamentally, I want to use Logstash (+ Elasticsearch/Kibana) to build a globally distributed cluster, but I want to limit the placement of primary and replica shards to be local to the region they were created in to reduce WAN traffic, but I also want to be able to query all data as a single dataset.
Let's say I have two ES nodes in UK (uknode1/uknode2), and two in US (usnode1/usnode2). If Logstash sends some data to usnode1, I want it to place the replica on usnode2, and not send this across the WAN to the uknode* nodes.
I've tried playing around with index and routing allocation settings, but cannot stop the shards being distributed across all 4 nodes. It's slightly complicated by the fact that index names are dynamically built based on the "type" but that's another challenge for a later date. Even with one index, I can't work this it.
I could split this into two separate clusters but I want to be able to query all nodes as a single dataset (via Kibana) so I don't think that is a valid option at this stage as Kibana can only query one cluster.
Is this even possible to achieve?
The reason I ask if this is possible is what would happen if I write to an index called "myTest" on UK node, and the same index on a US node.....as this is ultimately the same index and I'm not sure how ES would handle this.
So if anyone has any suggestions, or just to say "not possible", that would be very helpful.
Upvotes: 2
Views: 1137
Reputation: 33
Thanks. I looked into this a bit more and have the solution which is indeed using tribal nodes.
For anyone who isn't familiar with them, this is a new feature in ES 1.0.0+
What you do is allocate a new ES node as a tribe node, and configure it to connect to all your other clusters, and when you run a query against it, it queries all clusters and returns a consolidated set of results from all of them.
So in my scenario, I have two distinct clusters, one in each region something this.
US Region
cluster.name: us-region
Two nodes in this region called usnode1
and usnode2
Both nodes are master/data nodes
UK Region
cluster.name: uk-region
Two nodes in this region called uknode1
and uknode2
Both nodes are master/data nodes
The you create another ES node and add some configuration to make it a Tribe node.
Edit elasticsearch.yml
with something like this :
node.data: false
node.master: false
tribe.blocks.write: false
tribe.blocks.metadata: false
tribe.t1.cluster.name: us-region
tribe.t1.discovery.zen.ping.unicast.hosts: ["usnode1","usnode2"]
tribe.t2.cluster.name: uk-region
tribe.t2.discovery.zen.ping.unicast.hosts: ["uknode1","uknode2"]
You then point Kibana to the tribe node and it worked brilliantly - excellent feature.
Kibana dashboards still save, although I'm not sure how it picks which cluster to save to yet but seems to address my question so a bit more playing and I think it I'll have it sorted.
Upvotes: 1
Reputation: 30163
It's possible, but not recommended. Elasticsearch needs reliable data connection between nodes in the cluster to function, which is difficult to ensure for geographically distributed cluster. A better solution would be to have two clusters, one in UK and another one in US. If you need to search both of them at the same time you can use tribal node.
Upvotes: 3