MajorMajorMajorMajor
MajorMajorMajorMajor

Reputation: 71

Mapping custom skill output to complex type not working as expected using shaper skill

problem

I have tried three approaches to map the output of of my custom skill to populate Edm.complex type field in my search index. None seem to populate the field. The need is that each document in the search index contains the following chunk_object field.

index field definition

{"name": "chunk_object",
      "type": "Edm.ComplexType",
      "fields": [
        {
          "name": "chunk_content",
          "type": "Edm.String",
          "searchable": true,
          "filterable": true,
          "retrievable": true,
          "stored": true,
          "sortable": true,
          "facetable": true,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "page_start",
          "type": "Edm.Int64",
          "searchable": false,
          "filterable": true,
          "retrievable": true,
          "stored": true,
          "sortable": true,
          "facetable": true,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "page_end",
          "type": "Edm.Int64",
          "searchable": false,
          "filterable": true,
          "retrievable": true,
          "stored": true,
          "sortable": true,
          "facetable": true,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "chunk_idx",
          "type": "Edm.String",
          "searchable": true,
          "filterable": true,
          "retrievable": true,
          "stored": true,
          "sortable": true,
          "facetable": true,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        }
      ]
}

custom skill output

The output of custom skill is mapped to /document/jsonChunks/* . jsonChunks contains 239 objects.

{
    "values": [
        {
            "recordId": "1",
            "data": {
                "jsonChunks": [
                    {
                        "chunk": "this is chunk 1",
                        "page_start": 1,
                        "page_end": 1,
                        "chunk_idx": "#1-file.pdf'"
                    },
                    {
                        "chunk": "this is chunk 2",
                        "page_start": 1,
                        "page_end": 1,
                        "chunk_idx": "#1-file.pdf'"
                    }
                ]
            }
        }
    ]
}

in-memory output of custom skill

-/document/jsonChunks Object[239]
  -/*
    -/chunk_content
    -/page_start
    -/page_end
    -/chunk_idx

The in-memory enriched data structure for /document/jsonChunks/* is as follows

my approach

I will share the shaper skill definition and the in-memory enriched structure for each approach.

approach 1

skill definition

{
    "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
    "name": "#2",
    "description": "",
    "context": "/document",
    "inputs": [
        {
            "name": "chunk_content",
            "source": "/document/jsonChunks/*/chunk"
        },
        {
            "name": "page_start",
            "source": "/document/jsonChunks/*/page_start"
        },
        {
            "name": "page_end",
            "source": "/document/jsonChunks/*/page_end"
        },
        {
            "name": "chunk_idx",
            "source": "/document/jsonChunks/*/chunk_idx"
        }
    ],
    "outputs": [
        {
            "name": "output",
            "targetName": "chunk_object"
        }
    ]
}

in-memory output

/document/chunk_object Object
  -/chunk_content Object[239]
    -/*
  -/page_start Object[239]
    -/*
  -/page_end Object[239]
    -/*
  -/chunk_idx Object[239]
    -/*

approach 2

skill defintion

{
  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
  "name": "#2",
  "description": "",
  "context": "/document",
  "inputs": [
    {
      "name": "jsonChunk",
      "source": "/document/jsonChunks/*"
    }
  ],
  "outputs": [
    {
      "name": "output",
      "targetName": "chunk_object"
    }
  ]
}

in-memory output

/document/chunk_object Object
  -/jsonChunk Object[239]
    -/*
      -/chunk_content
      -/page_start
      -/page_end
      -/chunk_idx

approach 3

skill defintion

{
  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
  "name": "#2",
  "description": "",
  "context": "/document",
  "inputs": [
    {
      "name": "jsonChunk",
      "sourceContext": "/document/jsonChunks/*",
      "inputs": [
        {
          "name": "chunk_object",
          "source": "/document/jsonChunks/*/chunk"
        },
        {
          "name": "page_start",
          "source": "/document/jsonChunks/*/page_start"
        },
        {
          "name": "page_end",
          "source": "/document/jsonChunks/*/page_end"
        },
        {
          "name": "chunk_idx",
          "source": "/document/jsonChunks/*/chunk_idx"
        }
      ]
    }
  ],
  "outputs": [
    {
      "name": "output",
      "targetName": "chunk_object"
    }
  ]
}

in-memory output

/document/chunk_object Object
  -/jsonChunk Object[239]
    -/*
      -/chunk_content
      -/page_start
      -/page_end
      -/chunk_idx

None of my approaches above are working and the index field remains unpopulated. Can anyone please suggest any pointers or the right approach here? TIA

Upvotes: 0

Views: 80

Answers (1)

Suresh Chikkam
Suresh Chikkam

Reputation: 3473

The custom skill's output should be structured to match the chunk_object field in the index. Always check that each jsonChunk object is correctly mapped to the corresponding fields in chunk_object.

  • Use the ShaperSkill to transform the output of the custom skill into the desired structure. Each jsonChunk should be mapped to a new chunk_object.

Shaper Skill Definition:

{
  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
  "name": "shaper-skill",
  "description": "Shape the output to match the complex type in the index",
  "context": "/document",
  "inputs": [
    {
      "name": "jsonChunks",
      "source": "/document/jsonChunks"
    }
  ],
  "outputs": [
    {
      "name": "chunk_object",
      "targetName": "chunk_object",
      "inputs": [
        {
          "name": "chunk_content",
          "source": "/document/jsonChunks/*/chunk"
        },
        {
          "name": "page_start",
          "source": "/document/jsonChunks/*/page_start"
        },
        {
          "name": "page_end",
          "source": "/document/jsonChunks/*/page_end"
        },
        {
          "name": "chunk_idx",
          "source": "/document/jsonChunks/*/chunk_idx"
        }
      ]
    }
  ]
}
  • Inputs:

    jsonChunks from /document/jsonChunks.

  • Outputs:

    chunk_object is populated with sub-fields chunk_content, page_start, page_end, and chunk_idx from each element in jsonChunks.

enter image description here

Upvotes: 0

Related Questions