Reputation: 161
Does anyone have a hint on how to define a cognitive index/data to support the following scenarios:
Scenario 1
If the data contains something like this
So clients can get answers to the following questions:
In which country the product 1 is the cheapest?
What is the total number of product 1 outside of Europe?
Scenario 2
Is it possible to have multiple sources of own data, e.g. two cognitive indexes or a mix of a cognitive index and a blob storage?
Or how to structure an index that contains data + text(articles). Let's say we have a few articles about product 1 on top what we have in the table
How organise data, so that chat both be able to answer:
Give me a summary of the product 1 production process (from those articles) and what is the average price (from the table)
thanks in advance
Upvotes: 1
Views: 384
Reputation: 14619
Scenario 1:
While indexing your document, get the array information. This is not the case by default, if you use a simple "read" processing, as you might lose the details because the text might not be properly ordered etc.
Example when using simple OCR on the table:
because the info is read as:
Product
Country
product 1
UK
product 2
USA
product 1
Germany
11
999
Number of units
Price
10
1,000
5
1,300
So you need to use services that can read tables. For example, you can use Azure AI Document Intelligence (ex Form Recognizer) to get the table details:
Having this extraction, you can then format it as you like before storing it in your index. For example if you use an HTML array, when you query the details, it can work (demo made using Azure OpenAI Studio):
I don't know which format would be the best to store the array, as html is verbose, maybe simple markdown would be enough, or json formatting?
Scenario 2:
I think here it would be maybe more interesting to split the question (using langchain / semantic kernel) to be able to separate the question which are in fact 2 different questions. And also use a field in the index to define which data is global and which one is 'product specific'
Upvotes: 2