Hrm
Hrm

Reputation: 11

Elasticsearch querying time-series data, which approach is more efficient?

I have a use case where i am using elasticsearch datastreams to store logs related data

The current elastisearch policies are set to rollover the indices on two conditions

  1. Rollover if age is 1 day
  2. Rollover if max primary shard size is 24GB

Whenever this data is queried via the API's it always contains a @timestamp filter (This can be anything between last 5 mins to last 7 days)

What would be the best approach to query this data?

  1. Query directly on the datastream name (Pass timestamp filter in the query)
  2. Basis on timerange selected query on specific backing indices

Note: Data volume can be around 200-600 GB in a datastream

Is there any tradeoff in using any of the two above?

Also feel free to recommend better approaches.

Upvotes: 0

Views: 62

Answers (1)

imotov
imotov

Reputation: 30163

Elasticsearch is optimized to skip segments by looking at fields stats and not processing segments that have no intersection between data and the range filter. Moreover, if you query involves more than 128 shards (controlled by pre_filter_shard_size setting) elasticsearch will execute a special pre-filter query in which it will gather field stats from the shards and skip all shards that have no matching documents. So, I would try going with 1 and allow elasticsearch to do its thing.

Upvotes: 0

Related Questions