Reputation: 75
I am new to ElasticSearch and I have a file of 180 fields and 12 million lines. I have created an index and type in ElasticSearch and Java Program but it takes 1.5 hrs. Is there any other best way to to load data into ElasticSearch with reduced time. I have tried a map reduce program but some times it fails and generates duplicate entries and take more time than time my sequential program.
Can anybody give good suggestions ?
Upvotes: 3
Views: 286
Reputation: 1715
You may disable speculative execution when using ES-hadoop plugin to avoid duplicate entries. Try to fine tune the batch size of bulk api when using map-reduce to index the data. For more information please refer :-https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html and try changing the defaults to attain best performance. Also try to increase ES heap size. Also you can use apache Tika or mapper attachments plugin of ES to extract out information from file.
Hope it helps!
Upvotes: 0