Reputation: 339
I have an Azure Cognitive Search index which indexes data from multiple data sources. Each data source is indexed with a near identical indexer. Each indexer calls the same skillset configuration.
Within the index definition I have a field labeled "datasource"
which is intended to identify the data source for a particular document. I would like to have the indexer or use a modular skill, such as a conditional skill, to set the value of this field based on the data source. I understand it is possible to use a conditional skill to the value of a field if a value is not found, but I want to avoid having to create a new skillset for every indexer. My data sources are documents of multiple types in blob containers.
Using only the indexer definition is is possible to assign the value of a field to a string manually in the definition, by somehow extracting the name of the data source, or using a modular skill in the skillset definition?
An avenue I have been pursuing is setting user-specified blob metadata at the container level. However, I have not been able to successfully retrieve this information with either the indexer or skillset. I do not want to set this user-specified blob metadata on every single blob in a container.
Upvotes: 0
Views: 481
Reputation: 1472
The only way I've found to do this is what you're suggesting you would prefer to avoid: adding a new skillset per-data-source (not per-indexer). Obviously if you want to have some other skills in play as well as this, then you do need a new skillset per index.
In my case, I have multiple data sources being brought into a single index. In order to identify what that source originally was, I need to add essentially a hard-coded value into the enriched document, and then use outputMappings
to get it into the index.
My "skill" looks like this:
{
"name": "add-custom-source-value-skillset",
"description": "Add 'custom-source' as the value in the 'source' property",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
"name": "Add source",
"description": "Add source value",
"context": "/document",
"inputs": [
{
"name": "condition",
"source": "= $(/document/id) == 'XYZ'",
"inputs": []
},
{
"name": "whenTrue",
"source": "= 'custom-source'",
"inputs": []
},
{
"name": "whenFalse",
"source": "= 'custom-source'",
"inputs": []
}
],
"outputs": [
{
"name": "output",
"targetName": "source"
}
]
}
],
"@odata.etag": "...."
}
The fact that I've invented a nonsense "condition", which I've combined with doing the same thing whether it resolves as true or false (which I felt was more explicit) really points to this being something of a hack.
You're right that it would surely be preferable to do this kind of thing in some kind of metadata on the datasource which could be made available to the indexer - maybe at some path like /datasource/foo
instead of looking in /document/foo
, but alas ...
Upvotes: 0
Reputation: 376
Unfortunately it is not possible to configure a blob data source in a way that will pass unique information to the skillset. Having a separate skillset per datasource may be the cleanest option. Alternatively, you could pass metadata_storage_path to a custom skill and parse the container path to return a value by convention or mapping.
Upvotes: 0