Reputation: 8106
When reindexing an update heavy index in Elasticsearch that's actively being used, we first perform an initial reindex. After the first reindex is complete, we update the alias to point to the new index. But in the time it's taken to perform the first reindex, some of the documents in the original index may have been updated. Because of this, we perform a second reindex to ensure updates during the first reindex make it to the new index.
Am I doing this wrong? Will, during a reindex process, the updates that come in during the reindex process be applied at the end of the reindex?
e.g.
If i'm reindexing users-v1
to users-v2
and this takes 6 hours, many documents in userv-v1
will have been updated by the time the reindex finishes. If I sync user John in the first hour, and an update for John is made in the 4th hour, will that update also be applied to users-v2
? Or will I need to perform a second reindex after switching the alias to ensure that update made it?
Upvotes: 1
Views: 541
Reputation: 9770
You're doing the right thing, performing a second reindex is the right thing to do. Updates that happen during the first reindex will not be applied automatically.
Hopefully, you have a lastUpdatedDate
field or something similar, so that in the second reindex you can provide a query to reindex all documents that have changed.
One thing to take into account is deletes - by default the second reindex will not be aware of the documents that were deleted during the first reindex. To combat this you can either use soft deletes (instead of deleting, flag a doc as 'deleted'), or, if possible, have the deleting client log all the deleted document ids, and later delete them from the target index.
Upvotes: 3