Reputation: 165
Here's my situation
I have Azure Table as a source, and my target is Azure SQL database. Source table looks something like this:
id | file_name | metadata |
---|---|---|
1 | file_1.txt | {"Company": { "id": 555, "Name": "Company A" }, "quality": [{"quality": 3, "file_name": "file_1.txt"}, {"quality": 4, "file_name": "unkown"}]} |
2 | file_2.txt | {"Company": { "id": 231, "Name": "Company B" }, "quality": [{"quality": 4, "file_name": "file_2.txt"}, {"quality": 3, "file_name": "unkown"}]} |
3 | file_3.txt | {"Company": { "id": 111, "Name": "Company C" }, "quality": [{"quality": 5, "file_name": "unknown"}, {"quality": 4, "file_name": "file_3.txt"}]} |
The target table is supposed to look like this:
id | file_name | company | quality |
---|---|---|---|
1 | file_1.txt | Company A | 3 |
2 | file_2.txt | Company B | 4 |
3 | file_3.txt | Company C | 4 |
That means that I need to parse the data from this string to get the new column values, as well as use quality value depending on the file_name column from the source.
The first thing I've done is created a Copy pipeline to transfer the data 1 to 1 from Azure Tables to parquet file on Azure Data Lake Store so I can use it as a source in Data Flow. Next, the idea was to use derived column and use some expression to get the data but as far as I can see, there's no expression that treats this string as a JSON object.
So, the next idea was to maybe add a step before this process where I would extract the contents of metadata column to a separate file on ADLS and use that file as a source or lookup and define it as a JSON file to begin with. This would imply that I need to add id value to the JSON file so I'm able to tie the data back to the record.
The first solution looked more promising as the idea, but if that's not an option, I'll look into other possible solutions.
Thanks
Upvotes: 5
Views: 14250
Reputation: 234
You should use a Parse transformation. It is meant for parsing JSON from a column of data.
Upvotes: 2
Reputation: 6043
My debug result is as follows:
select t.id, A.Company,A.quality,A.file_name,A.quality_s,A.file_name_s
from dbo.test t
CROSS APPLY OPENJSON(t.metadata)
WITH (
company varchar(255) '$.Company.Name',
quality varchar(255) '$.quality[0].quality',
file_name varchar(255) '$.quality[0].file_name',
quality_s varchar(255) '$.quality[1].quality',
file_name_s varchar(255) '$.quality[1].file_name'
) A
The query result is as follows:
Then use data flow then do further processing. I will show u details when I back to my PC.
Use data flow to process this csv file. Set the Copy activity generated csv file as the source, data preview is as follows:
Use DerivedColumn1
to generate new columns,
FileName
: case(equalsIgnoreCase(file_name,'unknown'),file_name_s,file_name)
,
QualityS
: case(equalsIgnoreCase(file_name,'unknown'),quality_s,quality)
Data preview is as follows:
Use Select1
activity to filter columns which we want。
Data preview is as follows:
Then we can sink the result to a SQL table.
Upvotes: 6