Reputation: 131
According to Breaking changes in 2.1: Search changes:
The
scan
search type has been deprecated. All benefits from this search type can now be achieved by doing ascroll
request that sorts documents in_doc
order.
However, there is a huge performance gain when using scan
search type, in Node.js (with the official client), like this search request:
es.client.search({
index: 'library',
type: 'page',
scroll: '30s',
search_type: 'scan',
fields: ['page_id'],
q: 'book_id:1681'
}, ...);
Instead of this request:
es.client.search({
index: 'library',
type: 'page',
scroll: '30s',
sort: ["_doc"],
fields: ['page_id'],
q: 'book_id:1681'
}, ...);
Both requests return 12530 documents (of course after using scroll
). But scan
search type takes ~1s, while sort
in _doc
order takes more than ~4.5s!
Could you please tell me how to achieve all benefits from scan search by sorting documents in _doc order?
Update: Same results in Python. scan
search type is much faster than regular scroll
and sort in _doc
order.
Upvotes: 1
Views: 2656
Reputation: 131
From this pull request, Adrien Grand wrote:
How many shards do you have? I'm asking because
search_type='scan'
retrieves$size * $num_shards
docs per page, so the following would be a better comparison (assuming 5 shards):es.search(index='library', doc_type='page', scroll='2m', search_type='scan', size=10, body='{"query":{"term":{"book_id":1681}}}')
vs.
es.search(index='library', doc_type='page', scroll='2m', size=50, body='{"query":{"term":{"book_id":1681}},"sort":["_doc"]}')
I tested this, and it now has the same performance as scan
.
Upvotes: 1