GISScientist
GISScientist

Reputation: 1

What is loaded in memory except inverted index in Elasticsearch which makes it so fast in search?

What are the things which are there in memory of Elasticsearch which make search so fast? Are all jsons in memory themselves, or only inverted index and mapping will be in memory 24*7??

Upvotes: 0

Views: 1048

Answers (1)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6076

It is a good question, and then answer in short is:

It is not only data being stored in-memory that makes Elasticsearch searches so fast

Inverted indexes are not guaranteed to be always stored in memory. I didn't manage to find a direct proof, so I infer this from the following:

  • index segments may not be loaded in memory completely (see _cat/segments output parameter size.memory)
  • the very first advice in Tune for search speed is:

    Give memory to the filesystem cache

This means that Elasticsearch also stores index data on disk in quite smart way so filesystem itself helps it with often accessible searches.

One of such "life-hacks" is that for each field in the mapping there will be a different inverted index, which will be small enough to be efficiently cached by FS, if queried frequently (and fields you never query will just occupy the disk space).

So does Elasticsearch store original JSONs in memory?

No, it stores them in a special field called _source. It is not fast to retrieve it, that's why scripts accessing _source may be slow in execution.

Are there other data structures that make Elasticsearch fast?

Yes, for example, those ones that are used for aggregations:

  • doc_values, which are column-oriented storage for exact-value fields (this feature makes Elasticsearch a little bit Columnar DB), but again, it is not originally in-memory and gets "cached" upon frequent use;
  • fielddata, which does similar job but for text fields; it is actually stored in memory but it is not efficient and is turned off by default.

What else does Elasticsearch do to speed up the search?

It uses more caching: Shard request caching and Node query cache. As you see, it is not as simple as "just put data in memory".

Hope that helps!

Upvotes: 4

Related Questions