Reputation: 71
I have tried three approaches to map the output of of my custom skill to populate Edm.complex type field in my search index. None seem to populate the field. The need is that each document in the search index contains the following chunk_object
field.
{"name": "chunk_object",
"type": "Edm.ComplexType",
"fields": [
{
"name": "chunk_content",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "page_start",
"type": "Edm.Int64",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "page_end",
"type": "Edm.Int64",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "chunk_idx",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
}
]
}
The output of custom skill is mapped to /document/jsonChunks/* . jsonChunks contains 239 objects.
{
"values": [
{
"recordId": "1",
"data": {
"jsonChunks": [
{
"chunk": "this is chunk 1",
"page_start": 1,
"page_end": 1,
"chunk_idx": "#1-file.pdf'"
},
{
"chunk": "this is chunk 2",
"page_start": 1,
"page_end": 1,
"chunk_idx": "#1-file.pdf'"
}
]
}
}
]
}
in-memory output of custom skill
-/document/jsonChunks Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
The in-memory enriched data structure for /document/jsonChunks/*
is as follows
I will share the shaper skill definition and the in-memory enriched structure for each approach.
skill definition
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "chunk_content",
"source": "/document/jsonChunks/*/chunk"
},
{
"name": "page_start",
"source": "/document/jsonChunks/*/page_start"
},
{
"name": "page_end",
"source": "/document/jsonChunks/*/page_end"
},
{
"name": "chunk_idx",
"source": "/document/jsonChunks/*/chunk_idx"
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/chunk_content Object[239]
-/*
-/page_start Object[239]
-/*
-/page_end Object[239]
-/*
-/chunk_idx Object[239]
-/*
skill defintion
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "jsonChunk",
"source": "/document/jsonChunks/*"
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/jsonChunk Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
skill defintion
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "jsonChunk",
"sourceContext": "/document/jsonChunks/*",
"inputs": [
{
"name": "chunk_object",
"source": "/document/jsonChunks/*/chunk"
},
{
"name": "page_start",
"source": "/document/jsonChunks/*/page_start"
},
{
"name": "page_end",
"source": "/document/jsonChunks/*/page_end"
},
{
"name": "chunk_idx",
"source": "/document/jsonChunks/*/chunk_idx"
}
]
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/jsonChunk Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
None of my approaches above are working and the index field remains unpopulated. Can anyone please suggest any pointers or the right approach here? TIA
Upvotes: 0
Views: 80
Reputation: 3473
The custom skill's output should be structured to match the chunk_object
field in the index. Always check that each jsonChunk
object is correctly mapped to the corresponding fields in chunk_object
.
ShaperSkill
to transform the output of the custom skill into the desired structure. Each jsonChunk
should be mapped to a new chunk_object
.Shaper Skill Definition:
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "shaper-skill",
"description": "Shape the output to match the complex type in the index",
"context": "/document",
"inputs": [
{
"name": "jsonChunks",
"source": "/document/jsonChunks"
}
],
"outputs": [
{
"name": "chunk_object",
"targetName": "chunk_object",
"inputs": [
{
"name": "chunk_content",
"source": "/document/jsonChunks/*/chunk"
},
{
"name": "page_start",
"source": "/document/jsonChunks/*/page_start"
},
{
"name": "page_end",
"source": "/document/jsonChunks/*/page_end"
},
{
"name": "chunk_idx",
"source": "/document/jsonChunks/*/chunk_idx"
}
]
}
]
}
Inputs:
jsonChunks
from /document/jsonChunks
.
Outputs:
chunk_object
is populated with sub-fields chunk_content
, page_start
, page_end
, and chunk_idx
from each element in jsonChunks
.
Upvotes: 0