Elasticsearch Reindex API getting slower

Question

I have an index with 88 million docs, 0 replicas, 1 shard on an SSD. When I use the reindex API (with size 3000, refresh_interval -1) it starts getting slower slower as we pass the 50 million mark.

I assume ES is checking if the document exists? Is there a way to reindex and strip old document Ids so ES can generate new ones and index faster?

Also how can I reindex from a specific point? The problem I have is I have to pause my queue of new incoming docs until the reindex is complete, then switch the alias. It would be awesome if I could let the source index still get new docs then later start a new reindex to to move over those those news docs while the big reindex was happening.

lvandyk · Accepted Answer

Added the floowing script to the reindex call to fix the issue:

ctx.remove('_id');

Elasticsearch Reindex API getting slower

Answers (1)

Related Questions