Reputation: 51
I used a copy data activity in my pipeline to merge the files in my ADLS folder. However, in the sink dataset, the 2 folders' rows are mixed up
eg instead of A1,A2,A3,B1,B2,B3 I am getting something like A1,B1,B2,A2,B3,A3 (this order seems random, it looks like it changes every time I run the pipeline)
Is there a way to make it such that the rows in the final folder are:
For example: File A is an earlier dated file while file b is the latest file File A's Data:
A1
A2
A3
File B's Data:
B1
B2
B3
What I want:
Sink file's Data:
A1
A2
A3
B1
B2
B3
This was what I followed for my approach https://www.sqlservercentral.com/articles/merge-multiple-files-in-azure-data-factory
Upvotes: 0
Views: 602
Reputation: 5297
When I tried to replicate the issue with the inputs below:
With the copy activity, I faced the same issue as shown below:
According to this, the ADF copy mergeFiles
behavior does not have control over the order of files to be merged, nor the order of rows to be merged. It considers them randomly. That is the reason for getting the data randomly while merging the files using copy activity. I tried the following workaround to achieve the required output if the files are two with data flow using union transformation:
I have taken two sources as shown below in the dataflow: Source1:
Source2:
Use the union transformation to merge columns of sources with the options below:
Add a sink to the union transformation with the settings below to set the name for the target file:
Create a pipeline and run the dataflow activity with the reference of the above dataflow. It will run successfully, and the source files will merge as required, as shown below:
Upvotes: 0