Soben_SPDEV
Soben_SPDEV

Reputation: 37

Azure Data Factory - handling lookup activity limitation of 4194304 in size

I have a azure data factory pipeline with a lookup activity that check a JSON file.

The size is like below in azure: Azure Blog Size Screenshot

and when I download it, I see below values for the file. so it's not larger that the value the error states: "The size 5012186 of lookup activity result exceeds the limitation 4194304"

Size of the data as opened in Notepad ++

Also below is the design of my pipeline that gets stuck: Pipeline design - Lookup Activity to Read my model.json file to retrieve metadata

Any ideas on how to tackle this issue? thanks in advance

Upvotes: 1

Views: 6090

Answers (1)

Pratik Lad
Pratik Lad

Reputation: 8301

As lookup has the limitation of 5000 rows, you can try the below workaround for this.

To overcome this the workaround is as mentioned in Microsoft Document

Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.

Possible solution:

First, try to save your files list as JSON files to a folder of Blob storage with the size of 5000 rows.

create Get Metadata activity which can fetch the files from the folder

Get Metadata activity settings enter image description here

Then create For-each activity to Iterate over files

In for-each activity setting give items as @activity('Get Metadata1').output.childItems enter image description here

For the files create a dataset and give the folder name manually and for filename use the dataset parameter, which we can give the filename in the lookup inside the parent ForEach.

enter image description here

Lookup activity inside Parent ForEach give the file name as @string(item().name)

enter image description here

Execute Pipeline activity:

Before this create an array parameter in the child pipeline and pass the look up output inside ForEach to that in the Execute Pipeline activity. enter image description here

Give look up output @activity('Lookup1').output.value

enter image description here

Now create inside the Child Pipeline and give the array parameter to the ForEach as @pipeline().parameters.ok enter image description here

You can use which ever activity you want inside this ForEach, here I have used append.
enter image description here

Then create result1 variable as array and give value as @variables('arrayid') enter image description here

The Output will be the array of all ids in the file

enter image description hereBgULa02.png)

Upvotes: 3

Related Questions