Azure Search - database table is 2 million records but Azure Search index is 9 million

Question

Our SQL Server database table consists of approximately 2 million records. The Azure Search index we have that imports from it is now up to 9.5 million documents. Our indexer is set to run every 24 hours. We have Change Tracking turned ON in our database. We need to know how to properly configure the indexer to only import new records every 24 hours.

I am wondering if the issue is coming in with how our database writes its records: every 24 hours all 2 million records are gathered from the web and written to a staging table. Then a stored procedure compares the data in the staging table to the master data table. If the records for a certain Source ID are matched between the staging table and the master table, the staging table records will replace those records in the master table, even the identical records. The records in the master table with a source id that don't exist in the staging table are left alone. So even though most of those records being written to the master data table are just replacing existing identical data, the Azure Search indexer must see those as new records?

At any given time, if our database only has 2 million records, then the Azure Search index should only have 2 million documents. But it continues to add documents, seemingly based on the aforementioned and this is what we need to know how to fix.

EDIT: I noted an option for a "Soft Delete Policy." Is this something that would help us, as it would also delete documents in the index which correspond with records in the database that have been replaced?

EDIT #2: I've uploaded an Excel spreadsheet here which diagrams the flow I'm trying to explain, in case it helps to visualize it: https://dl.dropboxusercontent.com/u/8477791/v4AzureSearchIndexFlow.xlsx

Azure Search - database table is 2 million records but Azure Search index is 9 million

Answers (1)

Related Questions