Reputation: 179
I have a search engine application that parse feeds constantly and index the results in ES (Version 1.5.2).
I have an average of 3.5 million documents indexed. The deleted documents percentage is about 40% sometimes and I am having some request timeouts while indexing (bulk).
Which optimize policy should I take?
Should I have to stop indexing once or multiple times a day to
optimize the index and reduce the percentage of deleted documents and
merge the segments?
I would like to know which is the best solution for this use of case.
I am using a custom _id, I know it has performance issues, but it is not an option to change it sadly.
Thanks in advance
Upvotes: 0
Views: 66
Reputation: 6357
If some of your bulk index requests are timing out, that is indication that you need to lower the rate of indexing. Elasticsearch gurus advice not to use the optimize API. In the background segment merges happen which take care of getting rid of deleted documents automatically. Also never use optimize API if you have a high indexing rate. That will only cause more indexing requests to time out. And yes, optimize can also negatively affect search performance as it is a very resource intensive operation.
In a nutshell, just reduce your indexing rate. That should solve most of the problems you have mentioned here. Requests will not time out and deleted document percentage may also come down.
Upvotes: 0