Reputation: 2696
I want to get all data from an index. Since the number of items is too large for memory I use the Scroll (nice function):
client.prepareSearch(index)
.setTypes(myType).setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setSize(amountPerCall)
.setQuery(MatchAll())
.execute().actionGet();
Which works nice when calling:
client.prepareSearchScroll(scrollId)
.setScroll(new TimeValue(600000))
.execute().actionGet()
But, when I call the former method multiple times, I get the same scrollId
multiple times, hence I cannot scroll multiple times - in parallel.
I found http://elasticsearch-users.115913.n3.nabble.com/Multiple-scrolls-simultanious-td4024191.html which states that it is possible - though I don't know his affiliation to ES.
Am I doing something wrong?
Upvotes: 3
Views: 4575
Reputation: 22661
You can scroll the same index in same time, this is what elasticsearch-hadoop does.
Just, don't forget that under the hood, an index is composed of multiple shards that own data, so you can scroll each shards in parallel by using:
.setPreference("_shards:1")
Upvotes: 0
Reputation: 2696
After searching some more, I got the impression that this (same scrollId
) is by design. After the timeout has expired (which is reset after each call Elasticsearch scan and scroll - add to new index).
So you can only get one opened scroll per index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html states:
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.
So it appears what I wanted is not an option, on purpose - possibly because of optimization.
Update
As stated creating multiple scrolls cannot be done, but this is only true when the query you use for scrolling is the same. If you scroll
for, for instance, another type
, index
, or just another query
, you can have multiple scrolls
Upvotes: 3