GettingItDone
GettingItDone

Reputation: 603

Enumerate all files in a container and copy each file in a foreach

Need to load all .csv files in Azure Blob Container into SQL database.

Tried using a wild card *.* on the filename in the dataset which uses the linked service that connects to the blob and outputting the itemName in the Get Meta Data activity.

When executing in debug a list of filenames is not returned in the Output window. When referencing the parameter with an expression it is stated that the type is String not collection.

Upvotes: 1

Views: 3330

Answers (3)

Joel Cochran
Joel Cochran

Reputation: 7758

For this kind of task, I use the Get Metadata activity and process the results with a For Each activity. Inside the For Each activity, you can have a simple Copy activity to copy the CSV files to SQL tables, or if the work is more complex you can use Data Flow.

Some useful tips:

  1. In the Get Metadata activity, under the DataSet tab > "Field list", select the "Child Items" option:

enter image description here

  1. I recommend adding a "Filter" activity after the Get Metadata to ensure that you are only processing files, and optionally even expected extensions. You do this in the Settings tab like so:

enter image description here

  1. In the For Each activity, on the Settings tab, set the Items based on the output of the Filter activity:

enter image description here

  1. Inside the For Each, at the activity level, you reference the instance by "@item().name":

enter image description here

Here's what one of my production pipelines that implements this pattern looks like:

enter image description here

Upvotes: 3

ASH
ASH

Reputation: 20342

Are you actually using *.*? That may be too vague for the system to interpret. Maybe you can try something a little more descriptive, like this, ???20190101.json, or whatever matches the patters of your data-sets.

I encountered a weird problem a couple weeks ago, whereby I was using the wildcard character to iterate through a bunch of files. I was always starting on row 3 and as luck would have it, some files didn't have a row 3. A handful of files had metadata in row 1, field names in row 2, and no row 3. So, I was getting some weird errors. I changed the line start to be row 2 and everything worked fine after that.

Also, check out the link below.

https://azure.microsoft.com/en-us/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

Upvotes: 0

Jay Gong
Jay Gong

Reputation: 23792

For you needs, you could get an idea of LookUp Activity.

Lookup activity can retrieve a dataset from any of the Azure Data Factory-supported data sources. Use it in the following scenario:

Dynamically determine which objects to operate on in a subsequent activity, instead of hard coding the object name. Some object examples are files and tables.

For example, my container has 2 csv files in test container:

enter image description here

enter image description here

Configure a blob storage dataset :

enter image description here

Configure Lookup Activity:

enter image description here

Output:

enter image description here

Upvotes: 1

Related Questions