Osama Azzam
Osama Azzam

Reputation: 131

Elasticsearch: scan search vs sort by _doc

According to Breaking changes in 2.1: Search changes:

The scan search type has been deprecated. All benefits from this search type can now be achieved by doing a scroll request that sorts documents in _doc order.

However, there is a huge performance gain when using scan search type, in Node.js (with the official client), like this search request:

es.client.search({
  index: 'library',
  type: 'page',
  scroll: '30s',
  search_type: 'scan',
  fields: ['page_id'],
  q: 'book_id:1681'
}, ...);

Instead of this request:

es.client.search({
  index: 'library',
  type: 'page',
  scroll: '30s',
  sort: ["_doc"],
  fields: ['page_id'],
  q: 'book_id:1681'
}, ...);

Both requests return 12530 documents (of course after using scroll). But scan search type takes ~1s, while sort in _doc order takes more than ~4.5s!

Could you please tell me how to achieve all benefits from scan search by sorting documents in _doc order?

Update: Same results in Python. scan search type is much faster than regular scroll and sort in _doc order.

Upvotes: 1

Views: 2656

Answers (1)

Osama Azzam
Osama Azzam

Reputation: 131

From this pull request, Adrien Grand wrote:

How many shards do you have? I'm asking because search_type='scan' retrieves $size * $num_shards docs per page, so the following would be a better comparison (assuming 5 shards):

es.search(index='library',
doc_type='page',
scroll='2m',
search_type='scan',
size=10,
body='{"query":{"term":{"book_id":1681}}}')

vs.

es.search(index='library',
doc_type='page',
scroll='2m',
size=50,
body='{"query":{"term":{"book_id":1681}},"sort":["_doc"]}')

I tested this, and it now has the same performance as scan.

Upvotes: 1

Related Questions