Reputation: 479
I have a C# Azure function to read a file content from Blob and write it to a Azure Data Lake destination. The code works perfectly fine with the large size files (~8 MB and above) but with the small size files the destination file is written with 0 bytes. I tried to change the chunk size to a lower number and parallel threads to 1 but the behavior remains the same. I am simulating the code from Visual Studio 2017.
Please find the code snippet I am using. I have gone through the documentation on Parallel.ForEach limitations but didn't come across anything specific with the file size issues. (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism)
int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = blob.Properties.Length;
var outPutStream = new MemoryStream();
Queue<KeyValuePair<long, long>> queues = new
Queue<KeyValuePair<long, long>>();
long offset = 0;
while (blobRemainingLength > 0)
{
long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
offset += chunkLength;
blobRemainingLength -= chunkLength;
}
Console.WriteLine("Number of Queues: " + queues.Count);
Parallel.ForEach(queues,
new ParallelOptions()
{
//Gets or sets the maximum number of concurrent tasks
MaxDegreeOfParallelism = 10
}, (queue) =>
{
using (var ms = new MemoryStream())
{
blob.DownloadRangeToStreamAsync(ms, queue.Key,
queue.Value).GetAwaiter().GetResult();
lock (mystream)
{
var bytes = ms.ToArray();
Console.WriteLine("Processing on thread {0}",
Thread.CurrentThread.ManagedThreadId);
mystream.Write(bytes, 0, bytes.Length);
}
}
});
Appreciate all the help!!
Upvotes: 2
Views: 272
Reputation: 479
I found the issue with my code. The ADL Stream writer is not flushed and disposed properly. After adding the necessary code, parallelization with small/large files works fine.
Thanks for the suggestions!!
Upvotes: 1