baatchen
baatchen

Reputation: 489

Azure Data factory, How to incrementally copy blob data to sql

I have a azure blob container where some json files with data gets put every 6 hours and I want to use Azure Data Factory to copy it to an Azure SQL DB. The file pattern for the files are like this: "customer_year_month_day_hour_min_sec.json.data.json"

The blob container also has other json data files as well so I have filter for the files in the dataset.

First question is how can I set the file path on the blob dataset to only look for the json files that I want? I tried with the wildcard *.data.json but that doesn't work. The only filename wildcard I have gotten to work is *.json

Second question is how can I copy data only from the new files (with the specific file pattern) that lands in the blob storage to Azure SQL? I have no control of the process that puts the data in the blob container and cannot move the files to another location which makes it harder.

Please help.

Upvotes: 5

Views: 1643

Answers (2)

DTRT
DTRT

Reputation: 11050

I understand your situation. Seems they've used a new platform to recreate a decades old problem. :)

The patter I would setup first looks something like:

  1. Create a Storage Account Trigger that will fire on every new file in the source container.
  2. In the triggered Pipeline, examine the blog name to see if it fits your parameters. If no, just end, taking no action. If so, binary copy the blob to a account/container your app owns, leaving the original in place.
  3. Create another Trigger on your container that runs the import Pipeline.
  4. Run your import process.

Couple caveats your management has to understand. You can be very, very reliable, but cannot guarantee compliance because there is no transaction/contract between you and the source container. Also, there may be a sequence gap since a small file can usually process while a larger file is processing.

If for any reason you do miss a file, all you need to do is copy it to your container where your process will pick it up. You can load all previous blobs in the same way.

Upvotes: 1

Wang Zhang
Wang Zhang

Reputation: 327

You could use ADF event trigger to achieve this.

Define your event trigger as 'blob created' and specify the blobPathBeginsWith and blobPathEndsWith property based on your filename pattern.

For the first question, when an event trigger fires for a specific blob, the event captures the folder path and file name of the blob into the properties @triggerBody().folderPath and @triggerBody().fileName. You need to map the properties to pipeline parameters and pass @pipeline.parameters.parameterName expression to your fileName in copy activity.

This also answers the second question, each time the trigger is fired, you'll get the fileName of the newest created files in @triggerBody().folderPath and @triggerBody().fileName. Thanks.

Upvotes: 3

Related Questions