renaming files in a nested directory with azure data factory

Question

I have a daily export set up for several subscriptions - the files export like so

with 7 different directories within daily -- i'm simply trying to rename the files to get rid of the underscore for data flows

my parent pipeline looks like so

get metadata gets the folder names and for each invokes the child pipeline like so

here are the screen grabs of the child pipeline

copy data within the foreach1 -- the source

and now the sink - this is where i want to rename the file, the first time i debugged it simply copied them to the correct place with a .txt extension, the next time it got the extension right but it is not renaming the file,

i replaced @replace(item().name, '_', '-') with @replace(activity('FileInfo').output.itemName, '_','-') and got the following error

The expression '@replace(activity('FileInfo').output.itemName, '_','-')' cannot be evaluated because property 'itemName' doesn't exist, available properties are 'childItems, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference'.

so then I replaced that with @replace(activity('FileInfo').output.childItems, '_', '-') but that gives the following error Cannot fit childItems return type into the function parameter string

I'm not sure where to go from here

edit 7/14 making the change from the answer below

here is my linked service for the sink dataset with the parameter renamedFile

here is the sink on the copy data1 for the child_Rename pipeline, it grayed out the file extension as this was mentioned

now here is the sink container after running the pipeline

this is the directory structure of the source data - it's dynamically created from scheduled daily azure exports

here is the output of get metadata - FileInfo from the child pipeline

{
    "childItems": [
        {
            "name": "daily",
            "type": "Folder"
        }
    ],
    "effectiveIntegrationRuntime": "integrationRuntime1 (Central US)",
    "executionDuration": 0,
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    },
    "billingReference": {
        "activityType": "PipelineActivity",
        "billableDuration": [
            {
                "meterType": "AzureIR",
                "duration": 0.016666666666666666,
                "unit": "Hours"
            }
        ]
    }
}

allsubs - source container daily - directory created by the scheduled export sub1 - subN - the different subs with scheduled exports

previous-month -> this-month - monthly folders are created automatically

this_fileXX.csv -- files are automatically generated with the underscore in the name - it is my understanding that data flows cannot handle these characters in the file name

├──allsubs/
    └── daily/
        ├── sub1/
        |   └── previous-month/
                └── this_file.csv
                └── this_file1.csv
        |   └── previous-month/
                └── this_file11.csv
                └── this_file12.csv
        |   └── this-month/
        ├── subN/
        |   └── previous-month/
        |   └── previous-month/
        |   └── this-month/
                └── this_fileXX.csv

edit 2 - july 20

I think i'm getting closer but there are still some small errors i do not see

the pipeline now moves all the files from the container allsubs to the container renamed-files but it is not renaming the files - it looks like so

Get Metadata -from the dataset allContainers it retrieves the folders with the Child Items

dataset allContainers shown (preview works, linked service works, no paremeters in this dataset)

next the forEach activity calls the output of get metadata for the items @activity('Get Metadata1').output.childItems

next shown is the copy data within ForEach

the source is the allContainers dataset with the wildcard file path selected, recursively selected and due to the following error max concurrent connections set at 1 -- but this did not resolve the error

error message:

Failure happened on 'Sink' side. 
ErrorCode=AzureStorageOperationFailedConcurrentWrite,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Error occurred when trying to upload a file. 
It's possible because you have multiple concurrent copy activities 
runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'. 
Check your ADF configuration.
,Source=Microsoft.DataTransfer.ClientLibrary,
''Type=Microsoft.WindowsAzure.Storage.StorageException,
Message=The remote server returned an error: (400) Bad 
Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.
RequestId:b519219f-601e-000d-6c4c-9c9c5e000000
Time:2022-07-
20T15:23:51.4342693Z,,''Type=System.Net.WebException,
Message=The remote server returned an error: (400) Bad 
Request.,Source=Microsoft.WindowsAzure.Storage,'

copy data source:

copy data sink - the dataset is dsRenamesink, it's simply another container in a different storage account, linked service is set up correctly, it has the parameter renamedFile but I suspect this is the source of my error. still testing that.

sink dataset dsRenamesink:

parmeter page:

here's the sink in the copy data where the renamed file is passed the iterator from ForEach1 like so:

@replace(item().name,'_','renameworked') so the underscore would be replaced with 'renameworked' easy enough to test

debugging the pipeline

the errors look to be consistent for the 7 failures which was shown above as the 'failure happened on the sink side'

however - going into the storage account sink i can see that all of the files from the source were copied over to the sink but the files were not renamed like so

pipeline output:

error messages:

{
    "dataRead": 28901858,
    "dataWritten": 10006989,
    "filesRead": 4,
    "filesWritten": 0,
    "sourcePeakConnections": 1,
    "sinkPeakConnections": 1,
    "copyDuration": 7,
    "throughput": 4032.067,
    "errors": [
        {
            "Code": 24107,
            "Message": "Failure happened on 'Sink' side. ErrorCode=AzureStorageOperationFailedConcurrentWrite,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a file. It's possible because you have multiple concurrent copy activities runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'. Check your ADF configuration.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.
RequestId:b519219f-601e-000d-6c4c-9c9c5e000000
Time:2022-07-20T15:23:51.4342693Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
            "EventType": 0,
            "Category": 5,
            "Data": {
                "FailureInitiator": "Sink"
            },
            "MsgId": null,
            "ExceptionType": null,
            "Source": null,
            "StackTrace": null,
            "InnerEventInfos": []
        }
    ],
    "effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Central US)",
    "usedDataIntegrationUnits": 4,
    "billingReference": {
        "activityType": "DataMovement",
        "billableDuration": [
            {
                "meterType": "AzureIR",
                "duration": 0.06666666666666667,
                "unit": "DIUHours"
            }
        ]
    },
    "usedParallelCopies": 1,
    "executionDetails": [
        {
            "source": {
                "type": "AzureBlobFS",
                "region": "Central US"
            },
            "sink": {
                "type": "AzureBlobStorage"
            },
            "status": "Failed",
            "start": "Jul 20, 2022, 10:23:44 am",
            "duration": 7,
            "usedDataIntegrationUnits": 4,
            "usedParallelCopies": 1,
            "profile": {
                "queue": {
                    "status": "Completed",
                    "duration": 3
                },
                "transfer": {
                    "status": "Completed",
                    "duration": 2,
                    "details": {
                        "listingSource": {
                            "type": "AzureBlobFS",
                            "workingDuration": 0
                        },
                        "readingFromSource": {
                            "type": "AzureBlobFS",
                            "workingDuration": 0
                        },
                        "writingToSink": {
                            "type": "AzureBlobStorage",
                            "workingDuration": 0
                        }
                    }
                }
            },
            "detailedDurations": {
                "queuingDuration": 3,
                "transferDuration": 2
            }
        }
    ],
    "dataConsistencyVerification": {
        "VerificationResult": "NotVerified"
    },
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    }
}

all i wanted to do was remove the underscore from the file name to work with data flows....I'm not sure what else to try next

next attempt july 20

it appears that now I have been able to copy and rename some of the files -

changing the sink dataset as follows @concat(replace(dataset().renamedFile,'_','-'),'',formatDateTime(utcnow(),'yyyyMMddHHmmss'),'.csv')

and removing this parameter from the sink in the copy activity

upon debugging this pipeline I get 1 file in the sink and it is named correctly but there is still something wrong

third attempt 7/20

further updating to be closer to the original answer sink dataset

copy data activity in the sink - concat works

now after debugging i'm left with 1 file for each of the subs - so there is something still not quite correct

renaming files in a nested directory with azure data factory

edit 2 - july 20

debugging the pipeline

next attempt july 20

third attempt 7/20

Answers (1)

Related Questions