Any reason why this code may have corrupted couple of file while creating zip file

Question

The following code creates a zip file from S3 by pulling them into memory and write the final product to a file on disk. However, it is observer it corrupted few file (out of thousands) while creating the zip. I've checked, there is nothing wrong with files which got corrupted during the process, because same file(s) get zipped properly by other means. Any suggestions to fine tune the code?

Code:

public static async Task S3ToZip(List pdfBatch, string zipPath, IAmazonS3 s3Client)
{
    FileStream fileStream = new FileStream(zipPath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
    using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update, true))
    {
        foreach (var file in pdfBatch)
        {
            GetObjectRequest request = new GetObjectRequest
            {
                BucketName = "sample-bucket",
                Key = file
            };
            using GetObjectResponse response = await s3Client.GetObjectAsync(request);
            using Stream responseStream = response.ResponseStream;
            ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
            using Stream zipEntryStream = zipFileEntry.Open();
            await responseStream.CopyToAsync(zipEntryStream);
            zipEntryStream.Seek(0, SeekOrigin.Begin);
            zipEntryStream.CopyTo(fileStream);
        }
        archive.Dispose();
        fileStream.Close();
    }
}

Alexey Rumyantsev · Accepted Answer

Don't call Dispose() or Close() explicitly, let using do all the job. And you don't need to write anything to fileStream writing to ZipArchiveEntrystream does it under the hood. You also need to use FileMode.Create to guarantee that your file is always truncated before writing to it. Also as you only creating archive not updating it, you should use ZipArchiveMode.Create to enable memory efficient streaming (thanks to @canton7 for some deep diving in details of zip archive format).

public static async Task S3ToZip(List pdfBatch, string zipPath, IAmazonS3 s3Client)
{
    using FileStream fileStream = new FileStream(zipPath, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
    using ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Create, true);
    
    foreach (var file in pdfBatch)
    {
        GetObjectRequest request = new GetObjectRequest
        {
            BucketName = "sample-bucket",
            Key = file
        };
        using GetObjectResponse response = await s3Client.GetObjectAsync(request);
        using Stream responseStream = response.ResponseStream;
        ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
        using Stream zipEntryStream = zipFileEntry.Open();
        await responseStream.CopyToAsync(zipEntryStream);
    }         
}

Any reason why this code may have corrupted couple of file while creating zip file

Answers (1)

Related Questions