Soapy
Soapy

Reputation: 567

Easy way to continue a failed reindex?

I'm currently trying to reindex a large set of data (around 96 million documents) using the Python API, specifically the reindex command.

When running the command I eventually get a timeout error from the bulk command. I've tried setting the bulk_kwargs request_timeout to 24 hours, however it still timesout... after 28 hours and 57 million records loaded. Re-running the reindex will just delete the existing ones and start over.

Regardless of the reason why the error happens (I think I'm having problems with a disk bottleneck which I can fix. There are no out of memory errors) is there any easy way to continue the reindex from where it died?

Upvotes: 2

Views: 2002

Answers (1)

turkus
turkus

Reputation: 4903

If you're saying that you're deleting the existing ones and start over, then just delete index and create new one and feed it. Will be faster.

OR

If you cannot have empty index, then one by one or using some batch delete items identified by some id and insert updated according to that id.

Upvotes: 2

Related Questions