Bruce Johnston
Bruce Johnston

Reputation: 8634

Why is it possible to get duplicate results from Azure Search when paging?

Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:

GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc

Why is this possible? How can it happen? Are there any consistency guarantees when paging?

Upvotes: 4

Views: 2049

Answers (1)

Bruce Johnston
Bruce Johnston

Reputation: 8634

The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).

Here is an example of how you might get duplicates. Assume an index with four documents:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }
  3. { "id": "3", "rating": 2 }
  4. { "id": "4", "rating": 1 }

Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:

$top=2&$skip=0&$orderby=rating desc

And get these results:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }

Now you insert a fifth document into the index:

{ "id": "5", "rating": 4 }

Shortly thereafter, you execute a query to fetch the second page of results:

$top=2&$skip=2&$orderby=rating desc

And get these results:

  1. { "id": "2", "rating": 3 }
  2. { "id": "3", "rating": 2 }

Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.

In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.

For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.

Upvotes: 6

Related Questions