Reputation: 3619
I have couple of 50GB+ csv files in azure datalake store saved under partitioned folders like -
source/table/partition1/file1.csv
source/table/partition2/file2.csv
...
source/table/partitionN/fileN.csv
Files have same structure.I want to merge/append all these files together using Azure Powershell commands and produce one large file in new location without harming the original files.
I tried following command -
Join-AzureRmDataLakeStoreItem -AccountName "DatalakeStoreName"
-Paths "source/table/partition1/file1.csv",
"source/table/partition2/file2.csv"
-Destination "/Merged/table/final.csv"
But this destroyed the original files and just created new final.csv I am aware that, there is Azure Data factory available but it requires multiple code objects to be deployed and I am dealing with 7000+ feeds like this, so ideally I would like to run this merging activity via one powershell script.
Does any one know any efficient and different solution to this problem which keeps original files intact?
Upvotes: 0
Views: 1153
Reputation: 511
You can do this with a U-SQL script defined right in Powershell. Just make sure you also have an Azure Data Lake Analytics account. A sample PowerShell script with the U-SQL script in line:
#You can also save the script locally on a file and use -ScriptPath instead of -Script
$usqlScript = @"
@extract =
SELECT
column1 string,
column2 int,
#... list all columns and data types
columnN string,
partition string,
fileNumber int
FROM source/table/{partition}/file{fileNumber}.csv
OUTPUT @extract
TO "combinedFiles.csv"
USING Outputters.Csv();
"@
$adla = "DataLakeAnalyticsAccountName"
Submit-AzureRmDataLakeAnalyticsJob -AccountName $adla -Script $usqlScript -Name "JobName"
The fileset source/table/{partition}/file{fileNumber}.csv will extract every file who's path matches that pattern, and save the wildcards {partition} and {fileNumber} as columns as well so you don't have to lose that information when consolidating your data. And your original files will remain untouched. Let me know if you have other questions!
Upvotes: 2