Reputation: 25
In Azure Data Factory, how can I get each separate value from a column with a string array or delimited text format to separate columns so I can perform a lookup on each item to map each item to a new value?
I have a set of codes in one column in a data table. The column format can either be a string array (["SG", "DB", "TP"]
) or delimited text ("SG~DB~TP"
). I would like to map these codes to their descriptions and have the descriptions end up in another column in the same format (["single", "double", "triple"]
or "single~double~triple"
).
I have a lookup dataset/table that contains rows with each code and description on separate lines. How can I use the lookup dataset to map each code to its description? One important point is that the number of codes in each row of the data table can vary, e.g....
If the data table column has ["SG", "DB", "TP"]
map to ["single", "double", "triple"]
If the data table column has ["DB", "TP"]
map to ["double", "triple"]
My lookup table:
Code | Description |
---|---|
SG | single |
DB | double |
TP | triple |
I know that if I can get each value into a separate column then I can do a lookup on each column and recombine the later, but if there's a way to map the codes to their new values directly without having to split them into separate columns, even better! :-)
I know I can use split()
and join()
to convert between string array and delimited text so no problem there.
I know how to perform a lookup for a single value to that's not a problem either.
The real problem is accessing each individual code value given that the number of codes can vary. If I knew that there would always be (for example) 3 codes, I would just use <column name>[0]
, <column name>[1]
, <column name>[2]
, but I need to process each value regardless of the number in the array.
Upvotes: 1
Views: 308
Reputation: 11474
To perform this kind of operation, you need to use combination of ADF dataflow and for loop.
First create dataflow like below. create two string type parameters in the dataflow.
Give your input codes dataset to the source of the dataflow. If your source is a csv, give the configurations like below.
This is sample input data that I took in a csv file mycodes.csv
.
mycodes
["SG", "DB", "TP"]
SG~DB~TP
["DB", "TP"]
DB
SG~TP
["SG","DB"]
Add a derived column transformation to the dataflow source and use below expression which uses the created parameters for the mycodes
column.
replace(mycodes,$code,$desc)
The dataflow sink should be the same source dataset as well. As I mentioned if it is a csv file, make sure you transform it to single file. The name should be your source file.
Now, in the pipeline, give your lookup table to a lookup activity (uncheck FirstRow only) and give the lookup output array to a For-Each activity. Make sure you check the Sequential checkbox.
Inside the For-loop, add Dataflow activity and give the created dataflow. In the parameters section, give pipeline expressions @item().Code
and @item().Description
for the Code
and desc
parameters respectively.
This pipeline will loop through all of your code and description rows and in each iteration, it will replace the code with its description using dataflow and copy it into the same source. At the end of the pipeline, the same source file will contain the required result.
Upvotes: 0