ulitosCoder
ulitosCoder

Reputation: 2059

azure data factory: how to merge all files of a folder into one file

I need to create a big file, by merging multiple files scattered in several subfolders contained in an Azure Blob Storage, also a transformation needs to be done, each file contains a JSON array of a single element, so the final file, will contain an array of JSON elements.

The final purpose is to process that Big file in a Hadoop & MapReduce job.

The layout of the original files is similar to this:

folder
 - month-01
   - day-01
        - files...

- month-02
    - day-02
        - files...

Upvotes: 2

Views: 18917

Answers (1)

Jay Gong
Jay Gong

Reputation: 23792

I did a test based on your descriptions,please follow my steps.

My simulate data:

test1.json resides in the folder: date/day1

enter image description here

test2.json resides in the folder: date/day2

enter image description here

Source DataSet,set the file format setting as Array of Objects and file path as root path.

enter image description here

Sink DataSet,set the file format setting as Array of Objects and file path as the file you want to store the final data.

enter image description here

Create Copy Activity and set the Copy behavior as Merge Files.

enter image description here

Execution result:

enter image description here

The destination of my test is still Azure Blob Storage, you could refer to this link to learn about Hadoop supports Azure Blob Storage.

Upvotes: 10

Related Questions