Danielson
Danielson

Reputation: 2696

ElasticSearch Multiple Scrolls Java API

I want to get all data from an index. Since the number of items is too large for memory I use the Scroll (nice function):

client.prepareSearch(index)
    .setTypes(myType).setSearchType(SearchType.SCAN)
    .setScroll(new TimeValue(60000))
    .setSize(amountPerCall)
    .setQuery(MatchAll())
    .execute().actionGet();

Which works nice when calling:

client.prepareSearchScroll(scrollId)
    .setScroll(new TimeValue(600000))
    .execute().actionGet()

But, when I call the former method multiple times, I get the same scrollId multiple times, hence I cannot scroll multiple times - in parallel.

I found http://elasticsearch-users.115913.n3.nabble.com/Multiple-scrolls-simultanious-td4024191.html which states that it is possible - though I don't know his affiliation to ES.

Am I doing something wrong?

Upvotes: 3

Views: 4575

Answers (2)

Thomas Decaux
Thomas Decaux

Reputation: 22661

You can scroll the same index in same time, this is what elasticsearch-hadoop does.

Just, don't forget that under the hood, an index is composed of multiple shards that own data, so you can scroll each shards in parallel by using:

.setPreference("_shards:1")

Upvotes: 0

Danielson
Danielson

Reputation: 2696

After searching some more, I got the impression that this (same scrollId) is by design. After the timeout has expired (which is reset after each call Elasticsearch scan and scroll - add to new index).

So you can only get one opened scroll per index.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html states:

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

So it appears what I wanted is not an option, on purpose - possibly because of optimization.

Update
As stated creating multiple scrolls cannot be done, but this is only true when the query you use for scrolling is the same. If you scroll for, for instance, another type, index, or just another query, you can have multiple scrolls

Upvotes: 3

Related Questions