quickshare
quickshare

Reputation: 165

Use Azure Data Factory to parse JSON string from a column

Here's my situation

I have Azure Table as a source, and my target is Azure SQL database. Source table looks something like this:

id file_name metadata
1 file_1.txt {"Company": { "id": 555, "Name": "Company A" }, "quality": [{"quality": 3, "file_name": "file_1.txt"}, {"quality": 4, "file_name": "unkown"}]}
2 file_2.txt {"Company": { "id": 231, "Name": "Company B" }, "quality": [{"quality": 4, "file_name": "file_2.txt"}, {"quality": 3, "file_name": "unkown"}]}
3 file_3.txt {"Company": { "id": 111, "Name": "Company C" }, "quality": [{"quality": 5, "file_name": "unknown"}, {"quality": 4, "file_name": "file_3.txt"}]}

The target table is supposed to look like this:

id file_name company quality
1 file_1.txt Company A 3
2 file_2.txt Company B 4
3 file_3.txt Company C 4

That means that I need to parse the data from this string to get the new column values, as well as use quality value depending on the file_name column from the source.

The first thing I've done is created a Copy pipeline to transfer the data 1 to 1 from Azure Tables to parquet file on Azure Data Lake Store so I can use it as a source in Data Flow. Next, the idea was to use derived column and use some expression to get the data but as far as I can see, there's no expression that treats this string as a JSON object.

So, the next idea was to maybe add a step before this process where I would extract the contents of metadata column to a separate file on ADLS and use that file as a source or lookup and define it as a JSON file to begin with. This would imply that I need to add id value to the JSON file so I'm able to tie the data back to the record.

The first solution looked more promising as the idea, but if that's not an option, I'll look into other possible solutions.

Thanks

Upvotes: 5

Views: 14250

Answers (2)

Kiran-MSFT
Kiran-MSFT

Reputation: 234

You should use a Parse transformation. It is meant for parsing JSON from a column of data.

Upvotes: 2

Joseph  Xu
Joseph Xu

Reputation: 6043

My debug result is as follows: enter image description here

  1. I think you can use OPENJASON to parse the JSON String.
    select t.id, A.Company,A.quality,A.file_name,A.quality_s,A.file_name_s
        from dbo.test t
        CROSS APPLY OPENJSON(t.metadata) 
        WITH (
            company varchar(255) '$.Company.Name',
            quality varchar(255) '$.quality[0].quality',
            file_name varchar(255) '$.quality[0].file_name',
            quality_s varchar(255) '$.quality[1].quality',
            file_name_s varchar(255) '$.quality[1].file_name'
        ) A

The query result is as follows:
enter image description here
Then use data flow then do further processing. I will show u details when I back to my PC.

  1. Use Copy activity in ADF, copy the query result into a csv. enter image description here

  2. Use data flow to process this csv file. Set the Copy activity generated csv file as the source, data preview is as follows: enter image description here

  3. Use DerivedColumn1 to generate new columns,
    FileName : case(equalsIgnoreCase(file_name,'unknown'),file_name_s,file_name),
    QualityS: case(equalsIgnoreCase(file_name,'unknown'),quality_s,quality) enter image description here Data preview is as follows: enter image description here

  4. Use Select1 activity to filter columns which we want。 enter image description here Data preview is as follows: enter image description here

  5. Then we can sink the result to a SQL table.

Upvotes: 6

Related Questions