Reputation: 191
I have a daily export set up for several subscriptions - the files export like so
with 7 different directories within daily -- i'm simply trying to rename the files to get rid of the underscore for data flows
my parent pipeline looks like so
get metadata gets the folder names and for each invokes the child pipeline like so
here are the screen grabs of the child pipeline
copy data within the foreach1 -- the source
and now the sink - this is where i want to rename the file, the first time i debugged it simply copied them to the correct place with a .txt extension, the next time it got the extension right but it is not renaming the file,
i replaced @replace(item().name, '_', '-')
with @replace(activity('FileInfo').output.itemName, '_','-')
and got the following error
The expression '@replace(activity('FileInfo').output.itemName, '_','-')' cannot be evaluated because property 'itemName' doesn't exist, available properties are 'childItems, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference'.
so then I replaced that with @replace(activity('FileInfo').output.childItems, '_', '-') but that gives the following error Cannot fit childItems return type into the function parameter string
I'm not sure where to go from here
edit 7/14 making the change from the answer below
here is my linked service for the sink dataset with the parameter renamedFile
here is the sink on the copy data1 for the child_Rename pipeline, it grayed out the file extension as this was mentioned
now here is the sink container after running the pipeline
this is the directory structure of the source data - it's dynamically created from scheduled daily azure exports
here is the output of get metadata - FileInfo from the child pipeline
{
"childItems": [
{
"name": "daily",
"type": "Folder"
}
],
"effectiveIntegrationRuntime": "integrationRuntime1 (Central US)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
allsubs - source container daily - directory created by the scheduled export sub1 - subN - the different subs with scheduled exports
previous-month -> this-month - monthly folders are created automatically
this_fileXX.csv -- files are automatically generated with the underscore in the name - it is my understanding that data flows cannot handle these characters in the file name
├──allsubs/
└── daily/
├── sub1/
| └── previous-month/
└── this_file.csv
└── this_file1.csv
| └── previous-month/
└── this_file11.csv
└── this_file12.csv
| └── this-month/
├── subN/
| └── previous-month/
| └── previous-month/
| └── this-month/
└── this_fileXX.csv
I think i'm getting closer but there are still some small errors i do not see
the pipeline now moves all the files from the container allsubs to the container renamed-files but it is not renaming the files - it looks like so
Get Metadata -from the dataset allContainers it retrieves the folders with the Child Items
dataset allContainers shown (preview works, linked service works, no paremeters in this dataset)
next the forEach activity calls the output of get metadata
for the items @activity('Get Metadata1').output.childItems
next shown is the copy data within ForEach
the source is the allContainers dataset with the wildcard file path selected, recursively selected and due to the following error max concurrent connections set at 1 -- but this did not resolve the error
error message:
Failure happened on 'Sink' side.
ErrorCode=AzureStorageOperationFailedConcurrentWrite,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Error occurred when trying to upload a file.
It's possible because you have multiple concurrent copy activities
runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'.
Check your ADF configuration.
,Source=Microsoft.DataTransfer.ClientLibrary,
''Type=Microsoft.WindowsAzure.Storage.StorageException,
Message=The remote server returned an error: (400) Bad
Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.
RequestId:b519219f-601e-000d-6c4c-9c9c5e000000
Time:2022-07-
20T15:23:51.4342693Z,,''Type=System.Net.WebException,
Message=The remote server returned an error: (400) Bad
Request.,Source=Microsoft.WindowsAzure.Storage,'
copy data sink - the dataset is dsRenamesink, it's simply another container in a different storage account, linked service is set up correctly, it has the parameter renamedFile but I suspect this is the source of my error. still testing that.
here's the sink in the copy data where the renamed file is passed the iterator from ForEach1 like so:
@replace(item().name,'_','renameworked')
so the underscore would be replaced with 'renameworked' easy enough to test
the errors look to be consistent for the 7 failures which was shown above as the 'failure happened on the sink side'
however - going into the storage account sink i can see that all of the files from the source were copied over to the sink but the files were not renamed like so
error messages:
{
"dataRead": 28901858,
"dataWritten": 10006989,
"filesRead": 4,
"filesWritten": 0,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"copyDuration": 7,
"throughput": 4032.067,
"errors": [
{
"Code": 24107,
"Message": "Failure happened on 'Sink' side. ErrorCode=AzureStorageOperationFailedConcurrentWrite,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a file. It's possible because you have multiple concurrent copy activities runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'. Check your ADF configuration.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.\nRequestId:b519219f-601e-000d-6c4c-9c9c5e000000\nTime:2022-07-20T15:23:51.4342693Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"EventType": 0,
"Category": 5,
"Data": {
"FailureInitiator": "Sink"
},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []
}
],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Central US)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.06666666666666667,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "AzureBlobFS",
"region": "Central US"
},
"sink": {
"type": "AzureBlobStorage"
},
"status": "Failed",
"start": "Jul 20, 2022, 10:23:44 am",
"duration": 7,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 3
},
"transfer": {
"status": "Completed",
"duration": 2,
"details": {
"listingSource": {
"type": "AzureBlobFS",
"workingDuration": 0
},
"readingFromSource": {
"type": "AzureBlobFS",
"workingDuration": 0
},
"writingToSink": {
"type": "AzureBlobStorage",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 3,
"transferDuration": 2
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "NotVerified"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
all i wanted to do was remove the underscore from the file name to work with data flows....I'm not sure what else to try next
it appears that now I have been able to copy and rename some of the files -
changing the sink dataset as follows
@concat(replace(dataset().renamedFile,'_','-'),'',formatDateTime(utcnow(),'yyyyMMddHHmmss'),'.csv')
and removing this parameter from the sink in the copy activity
upon debugging this pipeline I get 1 file in the sink and it is named correctly but there is still something wrong
further updating to be closer to the original answer sink dataset
copy data activity in the sink - concat works
now after debugging i'm left with 1 file for each of the subs - so there is something still not quite correct
Upvotes: 0
Views: 717
Reputation: 2764
I reproduce the same thing in my environment.
Go to Sink dataset, click and open.First create parameters and add dynamic content, I used this expression @dataset().sinkfilename
In copy activity sink, under dataset properties pass the filename value using the expression @replace(item().name,'_','-')
to replace _
with -
.
when you create a dataset parameter to pass the filename, the File extension property is automatically disabled.
when the pipeline runs you can see the file name has been renamed accordingly.
Upvotes: -1