Reputation: 2033
Have ten billions of documents. One field of the document is timestamp (milliseconds), used the following mapping when indexing.
timestamp:
type: "date"
format: "YYYY-MM-dd HH:mm:ss||YYYY-MM-dd HH:mm:ss.SSS"
ignore_malformed: true
doc_values: true
When search, use the range filter. Since doc_value is used, range filter internally use invert index to search. It is kind of slowness.
The execution option controls how the range filter internally executes. The execution option accepts the following values: index: Uses the field’s inverted index in order to determine whether documents fall within the specified range.
If I change the mapping in another way, that is, use day instead of hours/seconds/milliseconds.
day:
type: "date"
format: "YYYY-MM-dd"
ignore_malformed: true
doc_values: true
when search, use the range filter, it is faster.
Can someone help explain why the performance differ.
The first one (using seconds/milliseconds), the invert index (assume internally it is kind of hashtable) has huge number of keys. While the second one (only use days), the invert index has much less keys. Is it the reason ?
Upvotes: 0
Views: 551
Reputation: 6357
Your assumption is correct. The number of unique values is less when time component of date is not indexed. When doing a range query, Elasticsearch has to "loop" over lesser number of postings list and hence the performance improvement observed.
Upvotes: 1