Reputation: 734
I have an instance of ElasticSearch running on a server. When I try to index a huge corpus using multiprocessing, I get a lot of timeout errors. It seems that the EasticSearch can handle only a few numbers of requests. I've followed the configuration suggested in the ElasticSearch website. Are there any suggestions on what should I do to increase its indexing performance for a multiprocessing setting? The index that I'm adding documents to has one shard.
Upvotes: 1
Views: 2873
Reputation: 1000
There are plenty of works that you can do.
First, you need to set refresh_interval. Refresh interval is the time that the added document will become available for search. If you can tolerate set it to at least 30 seconds or -1. I have read that this will increase the indexing performance by about 70%.
The second thing that you can try is to use bulk index API instead of a single document indexing.
Disabling swap can make an upper performance for you in some special cases.
One of the other options that you can try is to increase the RAM size that you have assigned to your elasticsearch;
Finally, increasing the size of HEAP to be used for indexing can increase the writing performance. the default size is 10 percent of all heap size.
I hope these points could help you.
Upvotes: 1