Reputation: 710
I have below JSON structure which is returned by Azure Form Recognizer API.
columnCount key gives actual number of columns.
I want to transform the cells where Kind = columnHeader into Column of Azure SQL table
And rest of cells where Kind = null should convert into Rows
What is the best way to achieve this in ADF ?
Note : Number of rows and columns returned from Form recognizer will be dynamic
{ "rowCount": 9, "columnCount": 2, "cells": [{ "kind": "columnHeader", "rowIndex": 0, "columnIndex": 0, "rowSpan": 1, "columnSpan": 1, "content": "ColumnA", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [0.4461, 7.7151, 3.7924, 7.7151, 3.7924, 7.968, 0.4461, 7.968] }], "spans": [{ "offset": 6640, "length": 9 }] }, { "kind": "columnHeader", "rowIndex": 0, "columnIndex": 1, "rowSpan": 1, "columnSpan": 1, "content": "ColumnB", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [3.7924, 7.7151, 7.1799, 7.7151, 7.1799, 7.968, 3.7924, 7.968] }], "spans": [{ "offset": 6650, "length": 9 }] }, { "rowIndex": 1, "columnIndex": 0, "rowSpan": 1, "columnSpan": 1, "content": "Cell1Value", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [0.4461, 7.968, 3.7924, 7.968, 3.7924, 8.195, 0.4461, 8.195] }], "spans": [{ "offset": 6660, "length": 18 }] }, { "rowIndex": 1, "columnIndex": 1, "rowSpan": 1, "columnSpan": 1, "content": "Cell2Value", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [3.7924, 7.968, 7.1799, 7.968, 7.1799, 8.195, 3.7924, 8.195] }], "spans": [{ "offset": 6679, "length": 18 }] }, ] }
Output should Look like this
ColumnA , ColumnB
Cell1Value, Cell2Value
Upvotes: 0
Views: 846
Reputation: 151
This is an example that had worked for me and was achieved by using the flatten and aggregate transformations.
When flattening, unroll by 'cells'.
Then, in the aggregate transformation, group by rowIndex and aggregate by text with this expression:
replace(trim(trim(toString(collect(text)),'['),']'),'"','')
If you wanted to continue to use dataflow to set up your table, you can create derived columns and set SQL as your sink.
Alternatively, you can save to storage and then use the copy activity in your pipeline.
Upvotes: 1