Reputation: 2059
I need to create a big file, by merging multiple files scattered in several subfolders contained in an Azure Blob Storage, also a transformation needs to be done, each file contains a JSON array of a single element, so the final file, will contain an array of JSON elements.
The final purpose is to process that Big file in a Hadoop & MapReduce job.
The layout of the original files is similar to this:
folder
- month-01
- day-01
- files...
- month-02
- day-02
- files...
Upvotes: 2
Views: 18917
Reputation: 23792
I did a test based on your descriptions,please follow my steps.
My simulate data:
test1.json
resides in the folder: date/day1
test2.json
resides in the folder: date/day2
Source DataSet
,set the file format setting as Array of Objects
and file path as root path
.
Sink DataSet
,set the file format setting as Array of Objects
and file path as the file you want to store the final data.
Create Copy Activity
and set the Copy behavior
as Merge Files
.
Execution result:
The destination of my test is still Azure Blob Storage, you could refer to this link to learn about Hadoop supports Azure Blob Storage.
Upvotes: 10