Reputation: 11
I have a use case where i am using elasticsearch datastreams to store logs related data
The current elastisearch policies are set to rollover the indices on two conditions
Whenever this data is queried via the API's it always contains a @timestamp filter (This can be anything between last 5 mins to last 7 days)
What would be the best approach to query this data?
Note: Data volume can be around 200-600 GB in a datastream
Is there any tradeoff in using any of the two above?
Also feel free to recommend better approaches.
Upvotes: 0
Views: 62
Reputation: 30163
Elasticsearch is optimized to skip segments by looking at fields stats and not processing segments that have no intersection between data and the range filter. Moreover, if you query involves more than 128 shards (controlled by pre_filter_shard_size
setting) elasticsearch will execute a special pre-filter query in which it will gather field stats from the shards and skip all shards that have no matching documents. So, I would try going with 1 and allow elasticsearch to do its thing.
Upvotes: 0