Silly John
Silly John

Reputation: 1704

Azure Search Index - Find the list of items

We are using Search Index to run one of our API. The data to the index is populated using the Azure functions which pull data from the database. We could see that the number of records in the database and the Search Service is different. Is there any way to get the list of Keys in the Search Service so that we can compare with the database and see which keys are missing?

Regards,

John

Upvotes: 1

Views: 1376

Answers (2)

howie
howie

Reputation: 2685

You can try to search "*". And use orderby and filter to get all data by following example. I use data metadata_storage_last_modified as filter.

    offset           skip              time
     0         --%-->  0
     100,000   --%-->  100,000      getLastTime
     101,000   --%-->  0            useLastTime
     200,000   --%-->   99,000      useLastTime
     201,000   --%-->  100,000      useLastTime & getLastTime
     202,000   --%-->  0            useLastTime

Because Skip limit is 100k, so we can calculate skip by

AzureSearchSkipLimit = 100k
AzureSearchTopLimit = 1k
skip = offset % (AzureSearchSkipLimit + AzureSearchTopLimit)

If total search count will large than AzureSearchSkipLimit, then apply

orderby = "metadata_storage_last_modified desc"

When skip reach AzureSearchSkipLimit ,then get metadata_storage_last_modified time from end of data. And put metadata_storage_last_modified as next 100k search filer.

filter = metadata_storage_last_modified lt ${metadata_storage_last_modified}

Upvotes: 1

Pablo Castro
Pablo Castro

Reputation: 1681

The Azure Search query API is designed for search/filter scenarios, it doesn't offer an efficient way to traverse through all documents.

That said, you can do this reasonably by scanning the keys in order: if you have a field in your index (the key field or another one) that's both filterable and sortable, you can use $select to pull only the keys for each document, 1000 at a time, ordered by that field. After you retrieve the first 1000, don't do $skip (which will limit you to 100,000), instead use a filter that uses greater-than against the field, using the highest value you saw in the previous response. This will allow you to traverse the whole set at reasonable performance, although doing it 1000 at a time will take time.

Upvotes: 2

Related Questions