Reputation: 567
I'm currently trying to reindex a large set of data (around 96 million documents) using the Python API, specifically the reindex
command.
When running the command I eventually get a timeout error from the bulk
command. I've tried setting the bulk_kwargs request_timeout
to 24 hours, however it still timesout... after 28 hours and 57 million records loaded.
Re-running the reindex will just delete the existing ones and start over.
Regardless of the reason why the error happens (I think I'm having problems with a disk bottleneck which I can fix. There are no out of memory
errors) is there any easy way to continue the reindex from where it died?
Upvotes: 2
Views: 2002
Reputation: 4903
If you're saying that you're deleting the existing ones and start over, then just delete index and create new one and feed it. Will be faster.
OR
If you cannot have empty index, then one by one or using some batch delete items identified by some id
and insert updated according to that id
.
Upvotes: 2