Alejandro Alvarez
Alejandro Alvarez

Reputation: 144

Set dynamic complex type in Azure Data Factory Parse activity

I have data from SQL, a table with two columns Id and Tags both of type string, Tags represents an array of jsons:

Id Tags
1 [{'tag': 'value', 'tag1' : 'value'},{'tag':'value', 'tag1':'value'}]
2 [{'tag': 'value', 'tag1' : 'value'}]

I want to set dynamically the name of the property tag and tag1, instead of being tag and tag1 maybe change it to property and property1. I have a json file to set the old and new property names:

{
  "tag" : "property",
  "tag1" : "property1"
}

enter image description here

I can easily change the property names by just replacing the old ones with the new ones and using Cached sink like this:

replace(replace(Tags,"tag",CachedSink#outputs()[1].tag),"tag1",CachedSink#outputs()[1].tag1)
Id Tags
1 [{'property': 'value', 'property1' : 'value'},{'property':'value', 'property1':'value'}]
2 [{'property': 'value', 'property1' : 'value'}]

After that I need to parse the Tags column into an actual array of json, which can be done by using the Parse activity and specify an output column type.

enter image description here

The Parse activity lets you specify the complex type, but apparently there is no way to reference any parameter or cached sink.

Is there a way to achieve this?

Upvotes: 0

Views: 180

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11489

Apart from the data types, no parameters or Dataflow functions are not supported in the Parse transformation expression.

To get the JSON involving dynamic column names, you can try the below workaround.

This involves generating the JSON file first, and then copying that JSON to your target. Use Blob or ADLS for storing the temporary JSON file.

After replacing your keys, follow the below transformations.

First add a derived column transformation and add any keyword column like below.

enter image description here

Then take aggregate transformation and in the group by section, use the above column key. In the aggregate section, use collect(Tags).

enter image description here

This will give the result like below.

enter image description here

Now, concat all items in this array and make a single JSON string of array of objects using below expression in the derived column.

replace(replace(concat('[',replace(replace(toString(Tags),'[',''),']',''),']'),'"{','{'),'}"','}')

enter image description here

Now, add sink and use a Delimited text dataset as sink. Use the Delimited text dataset for generating the JSON file. For that, give below configurations in the dataset.

enter image description here

Add this as sink and we need a single JSON file. So, go to sink settings -> set output to single file and give a target JSON file name.

enter image description here

In the mapping, give only required column.

enter image description here

You have two sinks. You are using cache sink in the second sink. So, cache sink should execute first and then this sink. You can set the write order as 1 for your cache sink and 2 for this sink.

enter image description here

After executing the dataflow from the pipeline, the required JSON file will be generated like below.

enter image description here

After dataflow activity, use copy activity to copy this JSON to your target location.

Upvotes: 0

Related Questions