Diksha Agarwal
Diksha Agarwal

Reputation: 23

How to convert an unstructured json file from Azure cosmos Db to a structured table?

I have a json file with dynamic schema in Azure Cosmos Db (Mongo API). I want to read this file, convert it into a structured sql table and store in Azure SQL Data warehouse. How do I achieve this?

I have already tried reading this unstructured data from Azure Data Factory using Copy Activity but it seems like ADF cannot read unsturctured data.

Sample data from my Cosmos DB is -

{
    "name" : "Dren",
    "details" : [
        {
            "name" : "Vinod",
            "relation" : "Father",
            "age" : 40,
            "country" : "India",
            "ph1" : "+91-9492918762",
            "ph2" : "+91-8769187451"
        },
        {
            "name" : "Den",
            "relation" : "Brother",
            "age" : 10,
            "country" : "India"
        },
        {
            "name" : "Vinita",
            "relation" : "Mother",
            "age" : 40,
            "country" : "India",
            "ph1" : "+91-9103842782"
        } ]
}

I expect NULL values for those columns whoes value does not exist in the json file.

Upvotes: 0

Views: 706

Answers (1)

PerfectlyPanda
PerfectlyPanda

Reputation: 3456

As you have noticed, Data Factory doesn't manipulate unstructured data. Relequestual has correctly suggested that an outside data mapper will be required as Azure Data Warehouse does not offer JSON manipulation either. There are a couple ways to do this from Data Factory. Both involve calling another service to handle the mapping for you.

1) Have the pipeline call an Azure Function to do the work. The pipeline wouldn't be able to pass data in and out of the function- it would need to read from Cosmos and write to Azure DW on its own. Between the two you can do your mapping in whatever language you write the function in. The upside of this is that they are fairly simple to write, but your ability to scale will be somewhat limited by how much data your function can process within a few minutes.

2) Do an interim hop in and out of Azure Data Lake. You would copy the data into a storage account (there are a few options that work with Data Lake Analytics), call the USQL job and then load the results into Azure DW. The downside of this is that you are adding extra read/writes to the storage account. However, it does let you scale as much as you need to based on your volume. It is also utilizing a SQL-like language if that is your preference.

Upvotes: 1

Related Questions