Sam
Sam

Reputation: 71

Moving - not copying - data in Azure Data Factory

I'd like to set up an Azure Data Factory pipeline which performs a move (i.e. copy, verify, delete) operation rather than just a copy operation between Blob Storage and a Data Lake Store. I cannot seem to find any detail on how to do this.

Upvotes: 7

Views: 11366

Answers (3)

Alex KeySmith
Alex KeySmith

Reputation: 17111

Just to add a contemporary update for anyone coming across this.

Data Factory V2 has relatively released a dedicated Delete Activity

At the time of writing this supports:

  • Azure Blob storage
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • File System
  • FTP
  • SFTP
  • Amazon S3
{
    "name": "DeleteActivity",
    "type": "Delete",
    "typeProperties": {
        "dataset": {
            "referenceName": "<dataset name>",
            "type": "DatasetReference"
        },
        "recursive": true/false,
        "maxConcurrentConnections": <number>,
        "enableLogging": true/false,
        "logStorageSettings": {
            "linkedServiceName": {
                "referenceName": "<name of linked service>",
                "type": "LinkedServiceReference"
            },
            "path": "<path to save log file>"
        }
    }
}

Taken from: https://learn.microsoft.com/en-gb/azure/data-factory/delete-activity

Upvotes: 2

wBob
wBob

Reputation: 14399

Azure Data Factory does not have a built-in activity or option to Move files as opposed to Copy them. You can however do this with a Custom Activity.

This example on github shows how to do this with Azure Blob:

...
blob.DeleteIfExists();
...

https://github.com/Azure/Azure-DataFactory/tree/master/Samples/DeleteBlobFileFolderCustomActivity

If you feel this is an important feature, please add a feedback request:

https://feedback.azure.com/forums/270578-data-factory

A Delete activity has been added recently:

https://azure.microsoft.com/en-us/blog/clean-up-files-by-built-in-delete-activity-in-azure-data-factory/

Upvotes: 2

Sharon Lo
Sharon Lo

Reputation: 39

From the product team on ADF here. While we're working on "Delete" as a first class activity in ADF, we have published a sample in Github in how users can delete files (in this case, Azure Blob) once they've been copied using ADF copy activity.

https://github.com/Azure/Azure-DataFactory/tree/master/Samples/DeleteBlobFileFolderCustomActivity

This is possible using the ADF custom .Net activity. The sample showcases the following:

  • a C# file which can be used as part of ADF custom .net activity to delete particular blobs or an entire folder.
  • Users need to provide a list of Azure Blob datasets to be deleted as a comma separated list in the 'inputToDelete' extended property in the pipeline json. The custom .Net activity will retrieve the dataset FolderPath and filename property. In case FolderPath is only specified, it will delete all the contents of the blob folder.

Contents of the Github repo:

  • DeleteFromBlobActivity.cs - C# file to be used as part of ADF Custom .Net activity to delete blob folders
  • PipelineSample.json - Showcases how to invoke the ADF Custom .Net delete blob activity. Replace placeholders corresponding to datasets names, schedule and linked services in the sample pipeline json.

Upvotes: 0

Related Questions