Will Sutton
Will Sutton

Reputation: 59

InvalidDataException when reading Zip file from Azure Storage blob

I am attempting to download a Zip file from an Azure Storage blob container and process it. Due to the requirements of my use case, I am required to handle the file as a memory stream. The code is as below:

using System;
using System.IO.Compression;
using System.Text;
using Azure.Storage.Blobs;

class Program
{
    async static Task Main(string[] args)
    {
        string blobContainerName = "test-container-001";
        string blobName = "input/report.zip";

        var containerClient = new BlobContainerClient("UseDevelopmentStorage=true", blobContainerName);
        var blobClient = containerClient.GetBlobClient(blobName);

        var content = await blobClient.DownloadContentAsync();
        var file = content.Value.Content.ToString();

        using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(file)))
        using (var zipArchive = new ZipArchive(stream, ZipArchiveMode.Read))
        {
            foreach(var e in zipArchive.Entries)
            {
                // Process files
                Console.WriteLine(e.Name);
            }
        }
    }
}

When the program gets to the loop over the Zip archive entries, I get the following error:

System.IO.InvalidDataException: 'Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory.'

I can read the Zip file from my disk using a FileStream and without using Azure Storage or the MemoryStream, but downloading it from Azure using this method causes this error. Could anyone provide advice on solving or diagnosing this?

Upvotes: 1

Views: 102

Answers (1)

Venkatesan
Venkatesan

Reputation: 10370

InvalidDataException when reading Zip file from Azure Storage blob

The issue you are facing is because of the way you are converting the blob content into string and then back to a byte arrayEncoding.UTF8.GetBytes(file), which is causing the ZIP file data to get corrupted. This is because a ZIP file is binary data, and when you convert it to a string and back to bytes, the binary data gets changed, resulting in a corrupted ZIP archive.

Here is the code that correctly download and process the ZIP file as a memory stream.

Code:

using System;
using System.IO;
using System.IO.Compression;
using Azure.Storage.Blobs;

class Program
{
    async static Task Main(string[] args)
    {
        string blobContainerName = "test";
        string blobName = "data.zip";

        var containerClient = new BlobContainerClient("<your storage connection string>", blobContainerName);
        var blobClient = containerClient.GetBlobClient(blobName);
        
        var downloadResponse = await blobClient.DownloadStreamingAsync();

        using (var memoryStream = new MemoryStream())
        {
            // Copy the blob content into the memory stream
            await downloadResponse.Value.Content.CopyToAsync(memoryStream);
            memoryStream.Position = 0;

            using (var zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Read))
            {
                foreach (var entry in zipArchive.Entries)
                {
                  
                    Console.WriteLine(entry.Name);
                }
            }
        }
    }
}

Output:

._sample-1.webp
._sample-1_1.webp
._sample-5.webp
._sample-5 (1).jpg

enter image description here

Reference: Download a blob with .NET - Azure Storage | Microsoft Learn

Upvotes: 3

Related Questions