Aniruddha
Aniruddha

Reputation: 1039

Any reason why this code may have corrupted couple of file while creating zip file

The following code creates a zip file from S3 by pulling them into memory and write the final product to a file on disk. However, it is observer it corrupted few file (out of thousands) while creating the zip. I've checked, there is nothing wrong with files which got corrupted during the process, because same file(s) get zipped properly by other means. Any suggestions to fine tune the code?

Code:

public static async Task S3ToZip(List<string> pdfBatch, string zipPath, IAmazonS3 s3Client)
{
    FileStream fileStream = new FileStream(zipPath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
    using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update, true))
    {
        foreach (var file in pdfBatch)
        {
            GetObjectRequest request = new GetObjectRequest
            {
                BucketName = "sample-bucket",
                Key = file
            };
            using GetObjectResponse response = await s3Client.GetObjectAsync(request);
            using Stream responseStream = response.ResponseStream;
            ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
            using Stream zipEntryStream = zipFileEntry.Open();
            await responseStream.CopyToAsync(zipEntryStream);
            zipEntryStream.Seek(0, SeekOrigin.Begin);
            zipEntryStream.CopyTo(fileStream);
        }
        archive.Dispose();
        fileStream.Close();
    }
}

Upvotes: 0

Views: 831

Answers (1)

Alexey Rumyantsev
Alexey Rumyantsev

Reputation: 523

Don't call Dispose() or Close() explicitly, let using do all the job. And you don't need to write anything to fileStream writing to ZipArchiveEntrystream does it under the hood. You also need to use FileMode.Create to guarantee that your file is always truncated before writing to it. Also as you only creating archive not updating it, you should use ZipArchiveMode.Create to enable memory efficient streaming (thanks to @canton7 for some deep diving in details of zip archive format).

public static async Task S3ToZip(List<string> pdfBatch, string zipPath, IAmazonS3 s3Client)
{
    using FileStream fileStream = new FileStream(zipPath, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
    using ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Create, true);
    
    foreach (var file in pdfBatch)
    {
        GetObjectRequest request = new GetObjectRequest
        {
            BucketName = "sample-bucket",
            Key = file
        };
        using GetObjectResponse response = await s3Client.GetObjectAsync(request);
        using Stream responseStream = response.ResponseStream;
        ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
        using Stream zipEntryStream = zipFileEntry.Open();
        await responseStream.CopyToAsync(zipEntryStream);
    }         
}

Upvotes: 3

Related Questions