Sagar
Sagar

Reputation: 710

How to Transform Form recognizer JSON structure into SQL table

I have below JSON structure which is returned by Azure Form Recognizer API.

columnCount key gives actual number of columns.

I want to transform the cells where Kind = columnHeader into Column of Azure SQL table

And rest of cells where Kind = null should convert into Rows

What is the best way to achieve this in ADF ?

  1. Use Flatten Activity in ADF
  2. Write Transformation logic using loops in Azure Function and return a CSV string
  3. Use libraries like JUST.NET inside Azure Functions

Note : Number of rows and columns returned from Form recognizer will be dynamic

{ "rowCount": 9, "columnCount": 2, "cells": [{ "kind": "columnHeader", "rowIndex": 0, "columnIndex": 0, "rowSpan": 1, "columnSpan": 1, "content": "ColumnA", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [0.4461, 7.7151, 3.7924, 7.7151, 3.7924, 7.968, 0.4461, 7.968] }], "spans": [{ "offset": 6640, "length": 9 }] }, { "kind": "columnHeader", "rowIndex": 0, "columnIndex": 1, "rowSpan": 1, "columnSpan": 1, "content": "ColumnB", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [3.7924, 7.7151, 7.1799, 7.7151, 7.1799, 7.968, 3.7924, 7.968] }], "spans": [{ "offset": 6650, "length": 9 }] }, { "rowIndex": 1, "columnIndex": 0, "rowSpan": 1, "columnSpan": 1, "content": "Cell1Value", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [0.4461, 7.968, 3.7924, 7.968, 3.7924, 8.195, 0.4461, 8.195] }], "spans": [{ "offset": 6660, "length": 18 }] }, { "rowIndex": 1, "columnIndex": 1, "rowSpan": 1, "columnSpan": 1, "content": "Cell2Value", "boundingRegions": [{ "pageNumber": 5, "boundingBox": [3.7924, 7.968, 7.1799, 7.968, 7.1799, 8.195, 3.7924, 8.195] }], "spans": [{ "offset": 6679, "length": 18 }] }, ] }

Output should Look like this

ColumnA , ColumnB

Cell1Value, Cell2Value

Upvotes: 0

Views: 846

Answers (1)

Noelle
Noelle

Reputation: 151

This is an example that had worked for me and was achieved by using the flatten and aggregate transformations.

When flattening, unroll by 'cells'.

Flatten

FlattenDataPreview

Then, in the aggregate transformation, group by rowIndex and aggregate by text with this expression:

replace(trim(trim(toString(collect(text)),'['),']'),'"','')

AggregateDataPreview

If you wanted to continue to use dataflow to set up your table, you can create derived columns and set SQL as your sink.

TableDataPreview

Alternatively, you can save to storage and then use the copy activity in your pipeline.

Upvotes: 1

Related Questions