cosmos database to be indexed in pull approach + files

Question

I have items and files. There is a 1:m relationship between items and files. Items are stored in a relational database and files in folders. The association between items and files is stored in the relational database. Files can be pdfs, word docs, email etc. I intend to POC cognitive search to be able to search items and associated documents.

My current understanding is, that a pull approach might be cheaper in comparison to the push approach when using cognitive search (the latency requirements are not stringent and eventual consistency is OK). Hence, I intend to move the data into a cosmos database, which can then be indexed via the pull approach. Curious, how does this work with the documents? Would I need to crack them on prem?

There is also the option of attachments and blob storage of documents. The latter is most likely more future proofed. I would think that if I put documents into blob storage, cognitive search indexing would still need to crack the documents and apply skills?

Jennifer Marsman - MSFT · Accepted Answer

This sounds like a good approach. In terms of data sources, Cognitive Search supports CosmosDB and blob storage and some relationship databases. I would probably:

Create a new Cognitive Search resource in the Azure portal.
In that Cognitive Search resource, click "Import data" to create a new indexer (this is the "pull" option that you mention above). You may want to do this twice, assuming that your items are in CosmosDB or a relational DB, and your documents are stored separately in blob storage.
The first indexer has a data source which points to your items/relationship data in whatever DB you decide to put them, applies any skills that you want, and puts everything in an index.
The second indexer has a different data source which points to your documents in blob storage, applies any skills that you want, and puts everything in the same index.

If you use indexers, they will take care of the document cracking. If you push data directly into the index, you will need to crack the documents yourself.

This gives a simple walkthrough of creating an indexer with the portal (skillset is optional, and change the data source to your own data): https://learn.microsoft.com/en-us/azure/search/cognitive-search-quickstart-blob

cosmos database to be indexed in pull approach + files

Answers (1)

Related Questions