Reputation: 602
I have 3 nodes of Elasticsearch (version 6.2.4) in my dev cluster. All the configurations are the default (even shards). I am trying to run some searches which will return millions of records. I decided to use Scroll with Java High-Level Rest Client. So my code looks like this
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("galaxy", galaxyName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(scrollSize);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(galaxyIndexName);
searchRequest.source(searchSourceBuilder);
searchRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));
SearchResponse searchResponse = restHighLevelClient.search(searchRequest);
StarCollection starCollection = new StarCollection();
boolean moreResultsExist = true;
int resultCount = 0;
while (moreResultsExist) {
String scrollId = searchResponse.getScrollId();
for (SearchHit searchHit : searchResponse.getHits()) {
Star star = objectMapper.readValue(searchHit.getSourceAsString(), Star.class);
resultCount++;
starCollection.addContentsItem(star);
}
if (resultCount >= searchResponse.getHits().getTotalHits()) {
moreResultsExist = false;
ClearScrollRequest request = new ClearScrollRequest();
request.addScrollId(scrollId);
restHighLevelClient.clearScroll(request);
}
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));
searchResponse = restHighLevelClient.searchScroll(scrollRequest);
}
Now, when I run search which returns 1.5 millions of documents, its taking forever. My method never finishes. Sometimes I get exception like
org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id
So, I have following questions -
Upvotes: 0
Views: 1993
Reputation: 625
Is this the right way to use Scroll?
Yes, Scroll is the optimum way to retrieve large scale results
Whats the best way to do searches which return millions of records?
First you must think why do you want so many records? Are you exporting your documents? otherwise retrieving so many results is not rational. You can limit your total search results by setting terminate_after
settings in query.
But if you really needs all those records, you have to break your query in smaller parts. For example if there is a date field in records, try to put filter on it, and iterate on it in smaller spans (for example 5 minutes steps).
And finally if you have delay more than scrollTimeValue
in your iterates, you get search_context_missing_exception
error.
Upvotes: 1