user1452030
user1452030

Reputation: 1041

ElasticSearch - Update or new index?

Requirements:

Given these requirements, we are planning to do the following:

  1. For incremental updates (diff) we can insert or update records as-is using the bulk API
  2. For full updates we will reconstruct a new index and swap the alias as mentioned in this post. In case of a rollback, we can revert to the previous working index (backups are also maintained if the rollback needs to go back a few versions)

Questions:

  1. Is this the best approach or is it better to CRUD documents on the previously created index using the built-in versioning, when re-constructing an index?
  2. What is the impact of modifying data (delete, update) to the underlying lucene indices/shards? Can modifications cause fragmentation or inefficiency?

Upvotes: 2

Views: 1181

Answers (1)

Val
Val

Reputation: 217554

  1. At first glance, I'd say that your overall approach is sound. Creating a new index every week with the new data and swapping an alias is a good approach if you need

    • zero downtime and
    • to be able to rollback to the previous indices for whatever reason

If you were to keep only one index and CRUD your documents in there, you'd not be able to rollback if anything goes wrong and you could end up in a mixed state with data from the current week and data from the week earlier.

  1. Every time you update (even one single field) or delete a document, the previous version will be flagged as deleted in the underlying Lucene segment. When the Lucene segments have grown sufficiently big, ES will merge them and wipe out the deleted documents. However, in your case, since you're creating an index every week (and eventually delete the index from the week prior), you won't land into a situation where you'll have space and/or fragmentation issues.

Upvotes: 2

Related Questions