Heschoon
Heschoon

Reputation: 3019

Faking Index per User: is having many aliases bad?

In the Faking Index per User article on the Elastic / Elasticsearch website, it is recommended to use a single index for multiple (thousands?) clients and use filter aliases to separate their data invisibly.

I heard someone say that it isn't a good practice because the aliases are part of the cluster state.

Why is it the case? It is the first time I heard this.

Upvotes: 1

Views: 533

Answers (1)

Zach
Zach

Reputation: 9721

There's nothing wrong with aliases per se. The alias is very lightweight: when you create an alias, it looks up the index and places an "alias tag" on that index.

When you execute a search against an alias, if there is no matching index it will check the tagged aliases and use the underlying index. The whole process is very light. So there is really no problem, from a search perspective, to having many aliases.

The note about Cluster State, however, is valid (sorta). Millions of aliases (or millions of fields, etc) will bloat up the cluster state. This cluster state is published to all nodes whenever there is a change, which is how Elasticsearch guarantees that all nodes can respond to all queries.

So the problem is if your cluster state becomes massive (hundreds of megabytes, etc) the physical act of publishing it to the cluster becomes non-negligible. Imagine publishing a 800mb file to 100 nodes every time a field or alias is added. There is also a certain CPU cost on the master which becomes a problem.

In practice there are a lot of tricks to keep this manageable, like compression, diffs between cluster states, batching, etc. But fundamentally the cluster state represents a single bottleneck which can become a problem if you let the state grow too large.

In the real world, few clusters ever reach this problem, since it requires a very large number of fields / aliases / indices / analyzers to actually bloat a cluster state to such a large size.

If you are concerned about this, you can keep an eye on the Pending Tasks API. Pending Tasks will show all cluster-level tasks that are queued up to process on the master node. It should almost always be empty, since the master is rarely the bottleneck in a cluster. But if you see this queue growing (and high load on your master), you may have a cluster state issue.

Upvotes: 5

Related Questions