Reputation: 43
Recently Microsoft published the Microsoft Search API (beta) which provides the possibility to index external systems by creating an MS Graph search custom connector. I created such a connector that was successful so far. I also pushed a few items to the index and in the MS admin center, I created a result type and a vertical. Now I'm able to find the regarded external items in the SharePoint Online modern search center in a dedicated tab belonging to the search vertical created before. So far so good.
But now I wonder:
How can I achieve that the external data is continuously pushed to the MS Search Index? (How can this be implemented? Is there any tutorial or a sample project? What is the underlying architecture?)
Is there a concept of Full / Incremental / Continuous Crawls for a Search Custom Connector at all? If so, how can I "hook" into a crawl in order to update changed data to the index?
Or do I have to implement it all on my own? And if so, what would be a suitable approach?
Upvotes: 2
Views: 506
Reputation: 74
I recently implemented incremental crawling for Graph connectors using Azure functions. I created a timer triggered function that fetches the items updated in the data source since the time of the last function run and then updates the search index with the updated items.
I also wrote a blog post around this approach considering a SharePoint list as the data source. The entire source code can be found at https://github.com/aakashbhardwaj619/function-search-connector-crawler. Hope it would be useful.
Upvotes: 1
Reputation: 181
Thank you for trying out the connector APIs. I am glad to hear that you are able to get items into the index and see the results.
Regarding your questions, the logic for determining when to push items, and your crawl strategy is something that you need to implement on your own. There is no one best strategy per se, and it will depend on your data source and the type of access you have to that data. For example, do you get notifications every time the data changes? If not, how do you determine what data has changed? If none of that is possible, you might need to do a periodic full recrawl, but you will need to consider the size of your data set for ingestion.
We will look into ways to reduce the amount of code you have to write in the future, but right now, this is something you have to implement on your own.
-James
Upvotes: 3