puri
puri

Reputation: 1849

How to convert Excel to JSON in Azure Data Factory?

I want to convert this Excel file which contains two tables in a single worksheet

enter image description here Into this JSON format

{

parent:
{
    "P1":"x1",
    "P2":"y1",
    "P3":"z1"
}
children: [
{"C1":"a1", "C2":"b1", "C3":"c1", "C4":"d1"},
{"C1":"a2", "C2":"b2", "C3":"c2", "C4":"d2"},
...
]
}

And then post the JSON to a REST endpoint.

How to perform the mapping and posting to REST service?

Also, it appears that I need to sink the JSON to a physical JSON file before I can post as a payload to REST service - is this physical sink step necessary or can it be held in memory?

I cannot use Lookup activity to read in the Excel file because it is limited to 5,000 rows and 4MB.

Upvotes: 1

Views: 1308

Answers (1)

Sally Dabbah
Sally Dabbah

Reputation: 479

I managed to do it in ADF, the solution is a bit long, but you can use azure functions to do it programmatically.

Here is a quick demo that i built:

the main idea is to split data, add headers as requested and then re-join data and add relevant keys like parents and children.

ADF:

  1. added Conditional join to split data (see attached pictures).
  2. add surrogate key for each table.
  3. filtered first row to get red off the headers in the csv.
  4. map children/parents' columns: renaming columns using derived column activity
  5. added constant value in children data flow so i can aggregate by it and convert the CSV into a complex data type.
  6. childrenArray: in a derived column,added subcolumn to a new column named Children and in values i added relevant columns.
  7. aggregated children Jsons by using the constant value.
  8. in parents dataFlow: after mapping columns , i created jsons using derived column.(please see attached pictures).
  9. joined the children array and parents jsons into one table so it will be converted to the requested Json.
  10. wrote to cached sink(here you can do the post request instead of writing to sink).

DataFlow: enter image description here

![enter image description here

Activities:

Conditional Split: enter image description here

AddSurrogateKey:

(it's the same for parents data flow just change the name of incoming stream as shown in dataflow above) enter image description here

FilterFirstRow:

enter image description here

MapChildrenColumns: enter image description here

MapParentColumns: enter image description here

AddConstantValue: enter image description here

PartentsJson: Here i added subcolumn in Expression Builder and sent column name as value,this will build the parents json. enter image description here

enter image description here

ChildrenArray: Again in a derived column, added column with a name "children" and in Expression Builder i added relevant columns.

enter image description here

Aggregate:

the purpose of this activity is to aggregate children Json's and build the array, without it you will not get an array. the aggregation function is collect().

enter image description here

enter image description here

Join Activity: Here i added an outer join to join the parents json and the children array. enter image description here

Select Relevant columns: enter image description here

Output: enter image description here

Upvotes: 1

Related Questions