fluidguid
fluidguid

Reputation: 1681

Skillset and Indexer are not populating data from Json array into an edm.ComplexType field

I am using Azure Search with api-version=2020-06-30. The goal is to populate an Azure search index with database output displayed below. The index has simple (Edm.String) and complex type (Edm.ComplexType) fields:

Database input:

enter image description here

Here is exploded presentation of Json array of field2. The goal is to populate that field in complex type field in Azure search index. The reason I must have field2 as a Json array is that the values in that field is going to be dynamic, and can be 0 to many elements in it. If I do not keep that field in a Json array, and keep it flattened, then the same document will be iterated multiple times.

[
  {
    "f1": "Element First - Value for f1",
    "f2": "Element First - Description for f2"
  },
  {
    "f1": "Element Second - Value for f1",
    "f2": "Element Second - Description for f2"
  }
]

Definition of Azure search index that need populating above database object:

enter image description here

When I search in Azure search explorer, I see the output as under:

enter image description here

When Indexer tries to populate the Index, I do see a warning in Indexer. So, I understand that I may be doing something wrong in Indexer or Skillset with fields field2, f1 or f2. I just don't know where to fix it, Skillset, indexer, somewhere else?

Expression language parsing issues:
Missing value '/document/field2/f1'.
Missing value '/document/field2/f2'.

I have tried multiple ways of structuring Index, Skillset, Skill and Indexer. Those tries either have ended up in error or warning while populating field field2 of tpe edm.ComplexType. Here are the definitions of each object I have tried.

Index, DataSource, Skillset and Indexer are created using API on Postman using below bodies. I did not copy paste all the variations of definitions I have tried, but I have tried annotating a few in these bodies:

Index:

{  
  "name": "dev-search1-hi",  
  "fields": [  
    {  
      "name": "idColumn",  
      "type": "Edm.String",  
      "searchable": true,  
      "filterable": true,  
      "sortable": true,  
      "facetable": true,  
      "key": true,  
      "retrievable": true,  
      "analyzer": "en.lucene"
    },
    {  
      "name": "title",  
      "type": "Edm.String",  
      "searchable": true,  
      "filterable": true,  
      "sortable": true,  
      "facetable": true,  
      "key": false,  
      "retrievable": true,  
      "analyzer": "en.lucene"
    },
    {  
      "name": "field2",  
      "type": "Edm.ComplexType",     <- I have tried changing this type to Collection(Edm.ComplexType), but then I had received an error:  The data field 'field2' in the document with key 'ARTICLE-DOC-V3-001' has an invalid value of type 'Edm.ComplexType' (JSON objects map to Edm.Complex). The expected type was 'Collection(Edm.ComplexType)'.
      "fields" : [
        {  
        "name": "f1",  
        "type": "Edm.String",  
        "searchable": true,  
        "filterable": true,
        "facetable": true,  
        "retrievable": true,  
        "analyzer": "en.lucene"
        },
        {  
        "name": "f2",  
        "type": "Edm.String",  
        "searchable": true,  
        "filterable": true,
        "facetable": true,  
        "retrievable": true,  
        "analyzer": "en.lucene"
        }
      ]
    }
  ]
}

Datasource:

{   
    "name" : "dev-sql0-hi",  
    "type": "azuresql",
    "credentials": { "connectionString" : "myConnectionStringValue" },
    "container" : { "name" : "vwMyViewName" }
} 

Skillset and skill:

{
    "name": "dev-skillset1-hi",
    "skills": [
        {
            "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
            "context": "/document",
            "inputs": [
                {
                    "name": "f1",
                    "source": "/document/field2/f1"   <- I have tried remove /f1 from this source, but then it just populates f1 field in index with entire value of field2 from database.
                },
                {
                    "name": "f2",
                    "source": "/document/field2/f2"   <- I have tried remove /f2 from this source, but then it just populates f2 field in index with entire value of field2 from database.
                }
            ],
            "outputs": [
                {
                    "name": "output",
                    "targetName": "skillsetTargetName1"
                }
            ]
        }
    ]
}

Indexer:

{   
    "dataSourceName" : "dev-sql0-hi",  
    "targetIndexName" : "dev-search1-hi",
    "skillsetName": "dev-skillset1-hi",
    "disabled" : false,
    "fieldMappings" : [
        { "sourceFieldName" : "idColumn", "targetFieldName" : "idColumn" },
        { "sourceFieldName" : "title", "targetFieldName" : "title" }
    ],
    "outputFieldMappings" : 
    [
        {
            "sourceFieldName" : "/document/skillsetTargetName1", 
            "targetFieldName" : "field2"
        }
    ]
} 

A few of the articles I have read to help me debug are these:

How to model complex data types in Azure Cognitive Search

https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types

Shaper cognitive skill

https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-shaper

How to map AI-enriched fields to a searchable index

https://learn.microsoft.com/en-us/azure/search/cognitive-search-output-field-mapping

You may think my problem is similar to these Stackoverflow questions, but they are not:

How to create a field mapping in Azure Search with a complex targetField

How to index complex types into Edm.ComplexType with Azure Cognitive Search

Maybe something is wrong in my skillset definition and I need to change it? The more and more I look at articles, stackoverflow questions and my definitions, I feel I have non-workable data source (i.e. Json array field), and maybe Shaper skillset is not enough to work that field? Maybe I need a custom skill?

Thank you in advance!

Upvotes: 0

Views: 1137

Answers (1)

fluidguid
fluidguid

Reputation: 1681

After struggling and keep looking for a solution, I found one. I am not sure if this is the solution, but it is a solution. This solution works as of Aug 10, 2021 with api version 2020-06-30.

Solution:

I am now using Microsoft.Skills.Custom.WebApiSkill, not Microsoft.Skills.Util.ShaperSkill. Database sends field2 data in string format, and Custom WebApiSkill transforms that into a C# object.

  1. Indexer fetches field2 data (Json array) in string format from database
  2. Custom WebApiSkill sends that field to your API using HTTP POST (or PUT)
  3. In your API, transform that string (i.e. Json array) into C# array (using Newtonsoft.Json or whatever works for you), or do whatever else processing you need to do.
  4. Send the C# object back to Skillset
  5. There is a certain format of input and output that is needed in custom WebApiSkill. The C# models are also here in this post.

New definitions of objects are as under:

Index:

{  
  "name": "dev-search1-hi",  
  "fields": [  
    {  
      "name": "idColumn",  
      "type": "Edm.String",  
      "searchable": true,  
      "filterable": true,  
      "sortable": true,  
      "facetable": true,  
      "key": true,  
      "retrievable": true,  
      "analyzer": "en.lucene"
    },
    {  
      "name": "title",  
      "type": "Edm.String",  
      "searchable": true,  
      "filterable": true,  
      "sortable": true,  
      "facetable": true,  
      "key": false,  
      "retrievable": true,  
      "analyzer": "en.lucene"
    },
    {  
      "name": "field2s",  
      "type": "Collection(Edm.ComplexType)",  
      "fields" : [
            {  
            "name": "f1",  
            "type": "Edm.String",  
            "searchable": true,  
            "filterable": true,
            "facetable": true,  
            "retrievable": true,  
            "analyzer": "en.lucene"
            },
            {  
            "name": "f2",  
            "type": "Edm.String",  
            "searchable": true,  
            "filterable": true,
            "facetable": true,  
            "retrievable": true,  
            "analyzer": "en.lucene"
            }
        ]
    }
  ]
}

Skillset and skill:

{
    "name": "dev-skillset1-hi",
    "skills": [
        {
            "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
            "name": "myCustomSkill1",
            "uri": "https://my_host_and_api_address_here.com/api/v1/transformMyData",
            "httpHeaders": {},
            "httpMethod": "POST",
            "timeout": "PT3M50S",
            "batchSize": 1,
            "degreeOfParallelism": 5,
            "context": "/document",
            "inputs": [
                {
                    "name": "field2sJson",
                    "source": "/document/field2sJson"
                }
            ],
            "outputs": [
                {
                    "name": "field2s",
                    "targetName": "field2s"
                }
            ]
        }
    ]
}

Indexer:

{   
    "dataSourceName" : "dev-sql0-hi",  
    "targetIndexName" : "dev-search1-hi",
    "skillsetName": "dev-skillset1-hi",
    "disabled" : false,
    "fieldMappings" : [
        { "sourceFieldName" : "idColumn", "targetFieldName" : "idColumn" },
        { "sourceFieldName" : "title", "targetFieldName" : "title" }
    ],
    "outputFieldMappings" : 
    [
        {
            "sourceFieldName" : "/document/field2s", 
            "targetFieldName" : "field2s"
        }
    ]
}

C# Controller:

[HttpPost]
[Route("transformMyData")]
public CustomWebApiSkillData TransformMyData([FromBody] CustomWebApiSkillData customWebApiSkillData)
{
    // For my needs, all I am doing is transformation of a string (Json array) into a C#.
    // However, you can pass whatever input field you need from Skillset and send back whatever output you need in Skillset.
    // For performance sake - just remember, this Custom WebApiSkill is going to be called for every single record (unless there is a way to call it for all the records at once - which I don't know yet).
    if (customWebApiSkillData == null || customWebApiSkillData.Values == null) return null;
    foreach (var value in customWebApiSkillData.Values)
    {
        if (value.Data != null && !string.IsNullOrWhiteSpace(value.Data.Field2sJson))
        {
            // here is where I convert my string (i.e. Json array) into C# array 
            value.Data.Field2s = CallYourFunctionHere(value.Data.Field2sJson);
        }
    }
    return customWebApiSkillData;
}

C# model for controller endpoint:

[SerializePropertyNamesAsCamelCase]
public class CustomWebApiSkillData
{
    [JsonProperty("values")]
    public List<Field2Value> Values { get; set; }
}

[SerializePropertyNamesAsCamelCase]
public class Field2Value
{
    [JsonProperty("recordId")]
    public string RecordId { get; set; }
    
    [JsonProperty("data")]
    public Field2Data Data { get; set; }
    
    [JsonProperty("errors", NullValueHandling = NullValueHandling.Include)]
    public List<Field2ValueValidation> Errors { get; set; }
    
    [JsonProperty("warnings", NullValueHandling = NullValueHandling.Include)]
    public Field2ValueValidation Warnings { get; set; }
}

[SerializePropertyNamesAsCamelCase]
public class Field2ValueValidation
{
    [JsonProperty("message")]
    public string Message { get; set; }
}

[SerializePropertyNamesAsCamelCase]
public class Field2Data
{
    [JsonProperty("field2sJson")]
    public string Field2sJson { get; set; }
    
    [JsonProperty("field2s")]
    public List<Field2> Field2s { get; set; }
}

[SerializePropertyNamesAsCamelCase]
public class Field2
{
    [JsonProperty("f1")]
    public string F1{ get; set; }

    [JsonProperty("f2")]
    public string F2 { get; set; }
}

Sql server screenshot:

enter image description here

Now when I search in Azure search explorer, I see the output as under:

enter image description here

I hope this answer is helpful to future visitors. Good luck and thank you!

Upvotes: 2

Related Questions