How to get Azure Data Factory to Loop Through Files in a Folder

Question

I am looking at the link below.

https://azure.microsoft.com/en-us/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

We are supposed to have the ability to use wildcard characters in folder paths and file names. If we click on the 'Activity' and click 'Source', we see this view.

I would like to loop through months any days, so it should be something like this view.

Of course that doesn't actually work. I'm getting errors that read: ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. How can I get the tool to recursively iterate through all files in all folders, given a specific pattern of strings in a file path and file name? Thanks.

Anish K · Accepted Answer

I would like to loop through months any days

In order to do this you can pass two parameters to the activity from your pipeline so that the path can be build dynamically based on those parameters. ADF V2 allows you to pass parameters.

Let's start the process one by one:

1. Create a pipeline and pass two parameters in it for your month and day.

Note: This parameters can be passed from the output of other activities as well if needed. Reference: Parameters in ADF

2. Create two datasets.

2.1 Sink Dataset - Blob Storage here. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.

2.2 Source Dataset - Blob Storage here again or depends as per your need. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
Note: 1. The folder path decides the path to copy the data. If the container does not exists, the activity will create for you and if the file already exists the file will get overwritten by default.

2. Pass the parameters in the dataset if you want to build the output path dynamically. Here i have created two parameters for dataset named monthcopy and datacopy.

3. Create Copy Activity in the pipeline.

Wildcard Folder Path:

    @{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}

where:
    The path will become as: current-yyyy/month-passed/day-passed/* (the * will take any folder on one level)

Test Result:

JSON Template for the pipeline:

{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "Copy Data1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSettings",
                            "recursive": true,
                            "wildcardFolderPath": {
                                "value": "@{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}",
                                "type": "Expression"
                            },
                            "wildcardFileName": "*.csv",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "sink": {
                        "type": "DelimitedTextSink",
                        "storeSettings": {
                            "type": "AzureBlobStorageWriteSettings"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".csv"
                        }
                    },
                    "enableStaging": false
                },
                "inputs": [
                    {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DelimitedText2",
                        "type": "DatasetReference",
                        "parameters": {
                            "monthcopy": {
                                "value": "@pipeline().parameters.month",
                                "type": "Expression"
                            },
                            "datacopy": {
                                "value": "@pipeline().parameters.day",
                                "type": "Expression"
                            }
                        }
                    }
                ]
            }
        ],
        "parameters": {
            "month": {
                "type": "string"
            },
            "day": {
                "type": "string"
            }
        },
        "annotations": []
    }
}

JSON Template for the SINK dataset:

{
    "name": "DelimitedText1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "corpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\",
            "quoteChar": """
        },
        "schema": []
    }
}

JSON Template for the Source Dataset:

{
    "name": "DelimitedText2",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "monthcopy": {
                "type": "string"
            },
            "datacopy": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "folderPath": {
                    "value": "@concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),dataset().monthcopy,'/',dataset().datacopy)",
                    "type": "Expression"
                },
                "container": "copycorpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\",
            "quoteChar": """
        },
        "schema": []
    }
}