Download blobs from azure storage asynchronously and save them in DataTable

Question

The following code shows how I download blobs from azure blob storage and save them into a DataTable:

foreach (var currIndexGroup in blobsGroupedByIndex)
{
    DataRow dr = dtResult.NewRow();
    foreach (var currIndex in currIndexGroup)
    {       
        long fileByteLength = currIndex.Properties.Length;
        byte[] serializedAndCompressedResult = new byte[fileByteLength];
        currIndex.DownloadToByteArray(serializedAndCompressedResult, 0);
        dr[currIndex.Metadata["columnName"]] = DeflateStream.UncompressString(serializedAndCompressedResult);
    }
    dtResult.Rows.Add(dr);
}

The problem is, that the download is pretty slow. 1000 real small blobs takes about 20 seconds to download. If I try to run it asynchronously by using currIndex.DownloadToByteArrayAsync(serializedAndCompressedResult, 0); the follow up line throws an exception Bad state (invalid stored block lengths).

What is the right way to fill this datatable asynchronously?

Jacob Roberts · Accepted Answer

//the plan here is to make a model that holds your currIndex and byte array so you can return that model from a task
public class MyModel 
{
    public CloudBlockBlob CurrIndex {get;set;} 
    public byte[] FileBytes {get;set;}
}



foreach (var currIndexGroup in blobsGroupedByIndex)
{

    var myTasks = new List>();
    foreach (var currIndex in currIndexGroup)
    {     
        myTasks.Add(Task.Factory.StartNew(() => 
        {
            var myModel = new MyModel();
            myModel.CurrIndex = currIndex;

            long fileByteLength = myModel.CurrIndex.Properties.Length;
            myModel.FileBytes = new byte[fileByteLength];
            currIndex.DownloadToByteArray(myModel.FileBytes, 0);
            return myModel;
        });
    }
    Task.WaitAll(myTasks.ToArray());

    foreach (var task in myTasks)
    {
        MyModel myModel = task.Result;
        DataRow dr = dtResult.NewRow();
        dr[myModel.CurrIndex.Metadata["columnName"]] = DeflateStream.UncompressString(myModel.FileBytes);
        dtResult.Rows.Add(dr);
    }
}

You can further your parallelism by using a Parallel.ForEach on your outter foreach loop. You would have to lock your dtResult to make it thread safe.

Download blobs from azure storage asynchronously and save them in DataTable

Answers (1)

Related Questions