JiboOne
JiboOne

Reputation: 1548

ElasticSearch retrieves documents slowly

I'm using Java_API to retrieve records from ElasticSearch, it needs approximately 5 second to retrieve 100000 document (record/row) in Java application.

Is it slow for ElasticSearch? or is it normal?

Here is the index settings:

enter image description here

I tried to get better performance but without result, here is what I did:

Here is my Java Implementation Code

private void getDocuments() {
        int counter = 1;
        try {
            lgg.info("started");
            TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
                    .addTransportAddress(new TransportAddress(InetAddress.getByName("localhost"), 9300));

            SearchResponse scrollResp = client.prepareSearch("ebpp_payments_union").setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                    .setQuery(QueryBuilders.matchAllQuery())                 
                    .setScroll(new TimeValue(1000))
                    .setFetchSource(new String[] { "payment_id" }, null)
                    .setSize(10000)
                    .get();

            do {
                for (SearchHit hit : scrollResp.getHits().getHits()) {
                    if (counter % 100000 == 0) {
                        lgg.info(counter + "--" + hit.getSourceAsString());
                    }
                    counter++;
                }

                scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                        .setScroll(new TimeValue(60000))
                        .execute()
                        .actionGet();
            } while (scrollResp.getHits().getHits().length != 0);

            client.close();
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
    }

I know that TransportClient is deprecated, I tried by RestHighLevelClient also, but it does not changes anything.

Do you know how to get better performance?

Should I change something in ElasticSearch or modify my Java code?

Upvotes: 2

Views: 1637

Answers (2)

Pierre Mallet
Pierre Mallet

Reputation: 7221

I see three possible axes for optimizations:

1/ sort your documents on _doc key :

Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option:

( documentation source )

2/ reduce your page size, 10000 seems a high value. Can you make differents test with reduced values like 5000 /1000?

3/ Remove the source filtering

.setFetchSource(new String[] { "payment_id" }, null)

It can be heavy to make source filtering, since the elastic node needs to read the source, transformed in Object and then filtered. So can you try to remove this? The network load will increase but its a trade :)

Upvotes: 1

TheFiddlerWins
TheFiddlerWins

Reputation: 922

Performance troubleshooting/tuning is hard to do with out understanding all of the stuff involved but that does not seem very fast. Because this is a single node cluster you're going to run into some performance issues. If this was a production cluster you would have at least a replica for each shard which can also be used for reading.

A few other things you can do:

  • Index your documents based on your most frequently searched attribute - this will write all of the documents with the same attribute to the same shard so ES does less work reading (This won't help you since you have a single shard)
  • Add multiple replica shards so you can fan out the reads across nodes in the cluster (once again, need to actually have a cluster)
  • Don't have the master role on the same boxes as your data - if you have a moderate or large cluster you should have boxes that are neither master nor data but are the boxes your app connects to so they can manage the meta work for the searches and let the data nodes focus on data.
  • Use "query_then_fetch" - unless you are using weighted searches, then you should probably stick with DFS.

Upvotes: 1

Related Questions