Eyal Ch
Eyal Ch

Reputation: 10056

in elasticsearch, what would be faster - filtering data, or index data in different indexes by date

i need to index ~1 Billion Records.

querying the data from elasticsearch is by month range. (not only by single month)

what would be faster?

  1. save my documents on different indexes? lets say index per month, or
  2. save it all on one index, as one of the doc fields will be 'date', and filter by this field?

Upvotes: 0

Views: 172

Answers (1)

Jilles van Gurp
Jilles van Gurp

Reputation: 8294

If you are querying by month range, definitely split your indexes by month. With a billion documents, you'll probably want lots of shards across many nodes. Splitting by date gets you this. The alternative is having a single index with a large number of shards. With a billion documents, we are talking probably dozens or hundreds of shards depending on your document size and hardware.

However if you split by date, most of your shards can answer cheaply that 0 documents match your query (assuming you get your filter query right for this) and have a handful of shards that actually have all the data for the months convered take care of the query. So, it's like querying a smaller index that has all of the data you need for the query.

Upvotes: 2

Related Questions