patyx
patyx

Reputation: 339

Can Azure Cognitive Search Indexer set field values?

I have an Azure Cognitive Search index which indexes data from multiple data sources. Each data source is indexed with a near identical indexer. Each indexer calls the same skillset configuration.

Within the index definition I have a field labeled "datasource" which is intended to identify the data source for a particular document. I would like to have the indexer or use a modular skill, such as a conditional skill, to set the value of this field based on the data source. I understand it is possible to use a conditional skill to the value of a field if a value is not found, but I want to avoid having to create a new skillset for every indexer. My data sources are documents of multiple types in blob containers.

Using only the indexer definition is is possible to assign the value of a field to a string manually in the definition, by somehow extracting the name of the data source, or using a modular skill in the skillset definition?

An avenue I have been pursuing is setting user-specified blob metadata at the container level. However, I have not been able to successfully retrieve this information with either the indexer or skillset. I do not want to set this user-specified blob metadata on every single blob in a container.

Upvotes: 0

Views: 481

Answers (2)

pete otaqui
pete otaqui

Reputation: 1472

The only way I've found to do this is what you're suggesting you would prefer to avoid: adding a new skillset per-data-source (not per-indexer). Obviously if you want to have some other skills in play as well as this, then you do need a new skillset per index.

In my case, I have multiple data sources being brought into a single index. In order to identify what that source originally was, I need to add essentially a hard-coded value into the enriched document, and then use outputMappings to get it into the index.

My "skill" looks like this:

{
  "name": "add-custom-source-value-skillset",
  "description": "Add 'custom-source' as the value in the 'source' property",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
      "name": "Add source",
      "description": "Add source value",
      "context": "/document",
      "inputs": [
        {
          "name": "condition",
          "source": "= $(/document/id) == 'XYZ'",
          "inputs": []
        },
        {
          "name": "whenTrue",
          "source": "= 'custom-source'",
          "inputs": []
        },
        {
          "name": "whenFalse",
          "source": "= 'custom-source'",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "source"
        }
      ]
    }
  ],
  "@odata.etag": "...."
}

The fact that I've invented a nonsense "condition", which I've combined with doing the same thing whether it resolves as true or false (which I felt was more explicit) really points to this being something of a hack.

You're right that it would surely be preferable to do this kind of thing in some kind of metadata on the datasource which could be made available to the indexer - maybe at some path like /datasource/foo instead of looking in /document/foo, but alas ...

Upvotes: 0

Corom - MSFT
Corom - MSFT

Reputation: 376

Unfortunately it is not possible to configure a blob data source in a way that will pass unique information to the skillset. Having a separate skillset per datasource may be the cleanest option. Alternatively, you could pass metadata_storage_path to a custom skill and parse the container path to return a value by convention or mapping.

Upvotes: 0

Related Questions