Reputation: 21
We're currently optimzing the sharding setup of our Elasticsearch index to (surprise) decrease response times. Currently the amount of routing keys is equal to the amount of shards. We're looking for a setup, where all documents in a shard are of one routing key only.
This is how it is at the moment and how it should look like
Current
Wanted
Is there any possibility to make sure, that one routing key will be routed only to one shard? Currently we're facing empty shards, which doesn't seems to be an appropriate solution.
We know that the routing is based on Murmur in version 5.50 (see: Murmur3HashFunction.java). Is there any option to influence this behavior and can someone offer deeper insights, how the routing works internally.
Upvotes: 1
Views: 777
Reputation: 21
To summarize the outcome: It's not possible.
Why? To work for the most use cases the routing is not directly based on the routing keys since the distribution of the documents might end up in a very unequal manner, if the distribution of routing key is like that (not for my case but in general it might be). The hashing of the routing key achieves this and even the disappearance of document having a certain routing will not end up in an empty shard.
You can create a workaround based on the knowledge of the used hashing function (Murmur) but this might break, if the Elasticsearch teams decides to changes the hashing function. And this happened already, so it's not save to rely on such a hidden feature.
The only way to achieve this is by creating a single index for each routing key as pointed out by Val.
Upvotes: 1