Reputation: 1944
Is there a way to find the size taken by individual fields in the index?
I have 10 fields and _source is disabled. I have no mapping for the fields.
With _all enabled the index size on disk was 95 mb
Without _all (disabled), the index size on disk was 70 mb
My understanding is _all stores a copy of all the fields. So wouldn't the index size be double with _all? Why would the difference be just 15 mb rather than 47mb?
Thanks
Upvotes: 1
Views: 326
Reputation: 33341
In addition to bsarkar's excellent answer, _all
is an index-only field (by default, anyway). That is, it is not stored. A stored and indexed field, which would be any field that both can be searched and can be retrieved with a search result, must have an inverted index built, and must also be stored in a raw form in order to be retrieved later. Storing the entire field contents can take up a very significant amount of storage.
Upvotes: 0
Reputation: 6357
_all
is not a copy of all the fields; it is just another field which stores values of all other fields.
Let's say we have only three documents (d1
, d2
and d3
) in the index with only two fields f1
and f2
. See below:
d1
{
"f1": "v1",
"f2": "v2"
},
d2
{
"f1": "v2",
"f2": "v2"
},
d3
{
"f1": "v1",
"f2": "v1"
}
Now Lucene will store this data in inverted indices, something like below.
Inverted index for field f1
:
"v1" -> "d1", "d3"
"v2" -> "d2"
Inverted index for field f2
:
"v1" -> "d3",
"v2" -> "d1", "d2"
When _all
is enabled, there will be an additional inverted index for the _all
field.
Inverted index for field _all
:
"v1" -> "d1", "d3"
"v2" -> "d1", "d2"
As you can clearly see, the posting list size without _all
is 6 documents while posting list size with _all
is 10 documents and not 12 documents.
This is just a simple example to prove that enabling _all
does not mean that the index size will simply double up.
Upvotes: 3