RedGiant
RedGiant

Reputation: 4748

Optional fields in elasticsearch

Suppose only 10 out of 1000 documents have a field called limitedEdition, would it add some sort of overhead to the other 990 documents that don't have any values for that field limitedEdition? Would those documents end up having a null value/reference in the elasticsearch indexing, kind of like adding a nullable column in sql?

{_id:1,category:[4],feature:[1,2]},
{_id:2,category:[5],feature:[3,5]},
{_id:3,category:[7],feature:[2,4]},
.....
{_id:10,category:[5],limitedEdition:1000}

The indexable fields are growing in my project so I have to reconsider whether these sparse columns should be stored in elasticsearch or reorganized the fields.

Upvotes: 1

Views: 2317

Answers (1)

xeraa
xeraa

Reputation: 10859

While this is a duplicate, there is some recent development in this area. With Lucene 7 (part of Elasticsearch 6.0), sparsity for doc values improved a lot:

With these changes, you finally only pay for what you actually use with doc values, in index size, indexing performance, etc. This is the same as other parts of the index like postings, stored fields, term vectors, etc., and it means users with very sparse doc values no longer see merges taking unreasonably long time or the index becoming unexpectedly huge while merging.

From http://blog.mikemccandless.com/2017/03/apache-lucene-70-is-coming-soon.html.

And you can see the change on https://home.apache.org/~mikemccand/lucenebench/sparseResults.html.

Upvotes: 1

Related Questions