abi_pat
abi_pat

Reputation: 602

Retrieving large results from Elasticsearch using Scroll takes forever

I have 3 nodes of Elasticsearch (version 6.2.4) in my dev cluster. All the configurations are the default (even shards). I am trying to run some searches which will return millions of records. I decided to use Scroll with Java High-Level Rest Client. So my code looks like this

MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("galaxy", galaxyName);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(scrollSize);

SearchRequest searchRequest = new SearchRequest();

searchRequest.indices(galaxyIndexName);
searchRequest.source(searchSourceBuilder);
searchRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));

SearchResponse searchResponse = restHighLevelClient.search(searchRequest);

StarCollection starCollection = new StarCollection();

boolean moreResultsExist = true;

int resultCount = 0;

while (moreResultsExist) {

    String scrollId = searchResponse.getScrollId();

    for (SearchHit searchHit : searchResponse.getHits()) {

        Star star = objectMapper.readValue(searchHit.getSourceAsString(), Star.class);
        resultCount++;

        starCollection.addContentsItem(star);
    }

    if (resultCount >= searchResponse.getHits().getTotalHits()) {

        moreResultsExist = false;

        ClearScrollRequest request = new ClearScrollRequest();
        request.addScrollId(scrollId);
        restHighLevelClient.clearScroll(request);
    }

    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
    scrollRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));
    searchResponse = restHighLevelClient.searchScroll(scrollRequest);
}

Now, when I run search which returns 1.5 millions of documents, its taking forever. My method never finishes. Sometimes I get exception like

org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id

So, I have following questions -

  1. Is this the right way to use Scroll?
  2. Whats the best way to do searches which return millions of records?

Upvotes: 0

Views: 1993

Answers (1)

vakarami
vakarami

Reputation: 625

Is this the right way to use Scroll?

Yes, Scroll is the optimum way to retrieve large scale results

Whats the best way to do searches which return millions of records?

First you must think why do you want so many records? Are you exporting your documents? otherwise retrieving so many results is not rational. You can limit your total search results by setting terminate_after settings in query.

But if you really needs all those records, you have to break your query in smaller parts. For example if there is a date field in records, try to put filter on it, and iterate on it in smaller spans (for example 5 minutes steps).

And finally if you have delay more than scrollTimeValue in your iterates, you get search_context_missing_exception error.

Upvotes: 1

Related Questions