Satya Azure
Satya Azure

Reputation: 479

C# Parallel copy - Issue with Small size files

I have a C# Azure function to read a file content from Blob and write it to a Azure Data Lake destination. The code works perfectly fine with the large size files (~8 MB and above) but with the small size files the destination file is written with 0 bytes. I tried to change the chunk size to a lower number and parallel threads to 1 but the behavior remains the same. I am simulating the code from Visual Studio 2017.

Please find the code snippet I am using. I have gone through the documentation on Parallel.ForEach limitations but didn't come across anything specific with the file size issues. (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism)

        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        var outPutStream = new MemoryStream();
        Queue<KeyValuePair<long, long>> queues = new 
                                             Queue<KeyValuePair<long, long>>();

        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Console.WriteLine("Number of Queues: " + queues.Count);

        Parallel.ForEach(queues,
              new ParallelOptions()
               {
                //Gets or sets the maximum number of concurrent tasks
                MaxDegreeOfParallelism = 10
               }, (queue) =>
                  {
                   using (var ms = new MemoryStream())
                    {
                      blob.DownloadRangeToStreamAsync(ms, queue.Key, 
                                    queue.Value).GetAwaiter().GetResult();
                      lock (mystream)
                        {

                          var bytes = ms.ToArray();
                          Console.WriteLine("Processing on thread {0}", 
                           Thread.CurrentThread.ManagedThreadId);
                           mystream.Write(bytes, 0, bytes.Length);

                        }

                }
             });

Appreciate all the help!!

Upvotes: 2

Views: 272

Answers (1)

Satya Azure
Satya Azure

Reputation: 479

I found the issue with my code. The ADL Stream writer is not flushed and disposed properly. After adding the necessary code, parallelization with small/large files works fine.

Thanks for the suggestions!!

Upvotes: 1

Related Questions