Reputation: 144
I have data from SQL, a table with two columns Id and Tags both of type string, Tags represents an array of jsons:
Id | Tags |
---|---|
1 | [{'tag': 'value', 'tag1' : 'value'},{'tag':'value', 'tag1':'value'}] |
2 | [{'tag': 'value', 'tag1' : 'value'}] |
I want to set dynamically the name of the property tag and tag1, instead of being tag and tag1 maybe change it to property and property1. I have a json file to set the old and new property names:
{
"tag" : "property",
"tag1" : "property1"
}
I can easily change the property names by just replacing the old ones with the new ones and using Cached sink like this:
replace(replace(Tags,"tag",CachedSink#outputs()[1].tag),"tag1",CachedSink#outputs()[1].tag1)
Id | Tags |
---|---|
1 | [{'property': 'value', 'property1' : 'value'},{'property':'value', 'property1':'value'}] |
2 | [{'property': 'value', 'property1' : 'value'}] |
After that I need to parse the Tags column into an actual array of json, which can be done by using the Parse activity and specify an output column type.
The Parse activity lets you specify the complex type, but apparently there is no way to reference any parameter or cached sink.
Is there a way to achieve this?
Upvotes: 0
Views: 180
Reputation: 11489
Apart from the data types, no parameters or Dataflow functions are not supported in the Parse transformation expression.
To get the JSON involving dynamic column names, you can try the below workaround.
This involves generating the JSON file first, and then copying that JSON to your target. Use Blob or ADLS for storing the temporary JSON file.
After replacing your keys, follow the below transformations.
First add a derived column transformation and add any keyword column like below.
Then take aggregate transformation and in the group by section, use the above column key
. In the aggregate section, use collect(Tags)
.
This will give the result like below.
Now, concat all items in this array and make a single JSON string of array of objects using below expression in the derived column.
replace(replace(concat('[',replace(replace(toString(Tags),'[',''),']',''),']'),'"{','{'),'}"','}')
Now, add sink and use a Delimited text dataset as sink. Use the Delimited text dataset for generating the JSON
file. For that, give below configurations in the dataset.
Add this as sink and we need a single JSON file. So, go to sink settings -> set output to single file and give a target JSON file name.
In the mapping, give only required column.
You have two sinks. You are using cache sink in the second sink. So, cache sink should execute first and then this sink. You can set the write order as 1
for your cache sink and 2
for this sink.
After executing the dataflow from the pipeline, the required JSON file will be generated like below.
After dataflow activity, use copy activity to copy this JSON to your target location.
Upvotes: 0