Dario Federici
Dario Federici

Reputation: 1258

azure data factory recursive copy from container

Hi I am using Azure Data Factory for a Copy activity. I want the copy to be recursive across a container and it subfolders as follows: myfolder/Year/Month/Day/Hour}/New_Generated_File.csv

The files that I am generating and importing into the folder have always a different name.

The problem is that activity seems to waiting for ever.

The pipeline is scheduled hourly.

I'm attaching the json code of the dataset and the linked service.

Dataset:

{
"name": "Txns_In_Blob",
"properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/{Custom}.csv",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

}

Linked Service:

{
"name": "LinkedService_To_Blob",
"properties": {
    "description": "",
    "hubName": "dataorchestrationsystem_hub",
    "type": "AzureStorage",
    "typeProperties": {
        "connectionString": "DefaultEndpointsProtocol=https;AccountName=wizestorage;AccountKey=**********"
    }
}

}

Upvotes: 1

Views: 3012

Answers (1)

Sandesh
Sandesh

Reputation: 3004

It is not mandatory to give the file name in the dataset's folderPath property. Just remove the file name and then all the files will be loaded by the datafactory for you.

{
  "name": "Txns_In_Blob",
  "properties": {
    "structure": [
        {
            "name": "Column0",
            "type": "String"
        },
        [....Other Columns....]
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "LinkedService_To_Blob",
    "typeProperties": {
        "folderPath": "uploadtransactional/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/",
        "partitionedBy": [
            { "name": "Year", "value": { "type": "DateTime", "date": "SliceStart", "format": "yyyy" } },
            { "name": "Month", "value": { "type": "DateTime", "date": "SliceStart", "format": "%M" } },
            { "name": "Day", "value": { "type": "DateTime", "date": "SliceStart", "format": "%d" } },
            { "name": "Hour", "value": { "type": "DateTime", "date": "SliceStart", "format": "hh" } }
        ],
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "    "
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    },
    "external": true,
    "policy": {}
}

With the above folderPath it will generate the run time value uploadtransactional/yearno=2016/monthno=05/dayno=30/hourno=07/ for a pipeline which executes UTC time zone now

Upvotes: 2

Related Questions