maxiruani
maxiruani

Reputation: 179

Which optimize policy should I take on heavy non stop indexing with Elasticsearch?

I have a search engine application that parse feeds constantly and index the results in ES (Version 1.5.2).

I have an average of 3.5 million documents indexed. The deleted documents percentage is about 40% sometimes and I am having some request timeouts while indexing (bulk).

I would like to know which is the best solution for this use of case.

I am using a custom _id, I know it has performance issues, but it is not an option to change it sadly.

Thanks in advance

Upvotes: 0

Views: 66

Answers (1)

bittusarkar
bittusarkar

Reputation: 6357

If some of your bulk index requests are timing out, that is indication that you need to lower the rate of indexing. Elasticsearch gurus advice not to use the optimize API. In the background segment merges happen which take care of getting rid of deleted documents automatically. Also never use optimize API if you have a high indexing rate. That will only cause more indexing requests to time out. And yes, optimize can also negatively affect search performance as it is a very resource intensive operation.

In a nutshell, just reduce your indexing rate. That should solve most of the problems you have mentioned here. Requests will not time out and deleted document percentage may also come down.

Upvotes: 0

Related Questions