Dhiraj
Dhiraj

Reputation: 3696

ADF not honoring sink block size in MB (100) for copy activity with ADX as source

I am using ADF copy activity with source as ADX dataset and sink as ADLSGen2. In the settings for the sink I have specified 100 , so that I expect that total data being written is say 1GB , there will be ~ 100 blobs produced.

enter image description here

When I ran the pipeline , there was a single blob of size 1.GGB despite having specified 100 as block size. I experimented with this value multiple times but I observed that it has no impact on number of blobs that get produced in the sink. It's as if it's entirely being ignored. Or it's just that this setting doesn't work when ADX is the source ?

Upvotes: 0

Views: 1177

Answers (2)

Anil Kumar
Anil Kumar

Reputation: 565

strange that Block Size(MB) is not working at my side as well, i've tried the other option Max rows per file, this looks promising, it's creating multiple files with 200000 max rows on each file.

enter image description here

Upvotes: 1

Steve Johnson
Steve Johnson

Reputation: 8660

When I ran the pipeline , there was a single blob of size 1.GGB despite having specified 100 as block size.

Actually, it should be a single 1.GGB file. In your case, you divide the file into serval chunks and copy to Azure Data lake Gen2. These chunks will join together and become the 1.GGB file. You can refer to this Documentation to learn more about block blobs.

If you want to check whether Block size(MB) option works, you can check the corresponding log file in the $log folder of Azure Data lake Gen2. It will show you the count of chunks .

enter image description here

Upvotes: 3

Related Questions