Reputation: 1
I have multiple folders with dynamic names in Azure Storage Explorer. Each folder has 3 files (A.csv, B.csv, and C.csv). How do you Export A.csv and B.csv from each folder into another folder and append the last modified date to the file name i.e. A012523.csv for last modified date 01-25-2023 using an ADF pipeline?
I used a metadata task to get the child items. I used a filter to get the folders that begin with xxx and end with zip.. The folders are unzipped. I used a foreach loop to loop through each folder. In the foreach loop I used a metadata task to get the item name and last modified. I used a copy task outside the foreach loop to copy the A.csv file from each folder to another azure storage explorer folder. In the copy task I used the wildcard file path srcfolder/file*.zip/A.csv for the source. The sink has a FileName @variables('LastModified'). The sink dataset has a FileName parameter with Adestfolder/@dataset().FileName as the file path.
I'm expecting only the renamed csv files in Adestfolder, but it's copying each folder with A.csv to the destination folder without renaming the file. A012523.csv A020823.csv A051023.csv
Upvotes: 0
Views: 89
Reputation: 11529
As you have only 3 files(A.csv
,B.csv
,C.csv
) in every source folder, you can try like below.
After filtering the folder names give that to a ForEach activity. Inside ForEach, create 3 pairs of Get Meta data activity -> Copy activity
(for 3 files in each folder) with dataset parameters.
Get Meta data activity:
Give the dataset with parameter for file name (Give the path till the Main source folder in the dataset path).
@concat(item().name,'/A.csv')
Then, add the copy activity to it. Use the same dataset(csv_file
) and same expression as above for the source of the copy activity.
For sink of the copy activity, use another dataset with parameter for file name.
Adestfolder/@{concat('A', formatDateTime(activity('Get Metadata2').output.lastModified, 'MMddyy'), '.csv')}
Do the same for B.csv
and C.csv
.
My Pipeline JSON for reference:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcecsv",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@and(startswith(item().name, 'File'), endswith(item().name,'.zip'))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Filter1').output.value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Get Metadata2",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/A.csv')",
"type": "Expression"
}
}
},
"fieldList": [
"lastModified"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Get Metadata2_copy1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/B.csv')",
"type": "Expression"
}
}
},
"fieldList": [
"lastModified"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Get Metadata2_copy2",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/C.csv')",
"type": "Expression"
}
}
},
"fieldList": [
"lastModified"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata2",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/A.csv')",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "sinkcsv",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "Adestfolder/@{concat('A', formatDateTime(activity('Get Metadata2').output.lastModified, 'MMddyy'), '.csv')}",
"type": "Expression"
}
}
}
]
},
{
"name": "Copy data2",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata2_copy1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/B.csv')",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "sinkcsv",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "Adestfolder/@{concat('B', formatDateTime(activity('Get Metadata2_copy1').output.lastModified, 'MMddyy'), '.csv')}",
"type": "Expression"
}
}
}
]
},
{
"name": "Copy data3",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata2_copy2",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "csv_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "@concat(item().name,'/C.csv')",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "sinkcsv",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "Adestfolder/@{concat('C', formatDateTime(activity('Get Metadata2_copy2').output.lastModified, 'MMddyy'), '.csv')}",
"type": "Expression"
}
}
}
]
}
]
}
}
],
"annotations": []
}
}
Result files after Pipeline Execution:
If you have more than the above fixed files in the folders, then you need to use another loop inside ForEach. As nested loops are not supported in ADF, you can try with calling another pipeline with loop by passing this file paths.
Upvotes: 0