Reputation: 1039
The following code creates a zip file from S3 by pulling them into memory and write the final product to a file on disk. However, it is observer it corrupted few file (out of thousands) while creating the zip. I've checked, there is nothing wrong with files which got corrupted during the process, because same file(s) get zipped properly by other means. Any suggestions to fine tune the code?
Code:
public static async Task S3ToZip(List<string> pdfBatch, string zipPath, IAmazonS3 s3Client)
{
FileStream fileStream = new FileStream(zipPath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update, true))
{
foreach (var file in pdfBatch)
{
GetObjectRequest request = new GetObjectRequest
{
BucketName = "sample-bucket",
Key = file
};
using GetObjectResponse response = await s3Client.GetObjectAsync(request);
using Stream responseStream = response.ResponseStream;
ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
using Stream zipEntryStream = zipFileEntry.Open();
await responseStream.CopyToAsync(zipEntryStream);
zipEntryStream.Seek(0, SeekOrigin.Begin);
zipEntryStream.CopyTo(fileStream);
}
archive.Dispose();
fileStream.Close();
}
}
Upvotes: 0
Views: 831
Reputation: 523
Don't call Dispose()
or Close()
explicitly, let using
do all the job. And you don't need to write anything to fileStream
writing to ZipArchiveEntry
stream does it under the hood. You also need to use FileMode.Create
to guarantee that your file is always truncated before writing to it. Also as you only creating archive not updating it, you should use ZipArchiveMode.Create
to enable memory efficient streaming (thanks to @canton7 for some deep diving in details of zip archive format).
public static async Task S3ToZip(List<string> pdfBatch, string zipPath, IAmazonS3 s3Client)
{
using FileStream fileStream = new FileStream(zipPath, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
using ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Create, true);
foreach (var file in pdfBatch)
{
GetObjectRequest request = new GetObjectRequest
{
BucketName = "sample-bucket",
Key = file
};
using GetObjectResponse response = await s3Client.GetObjectAsync(request);
using Stream responseStream = response.ResponseStream;
ZipArchiveEntry zipFileEntry = archive.CreateEntry(file.Split('/')[^1]);
using Stream zipEntryStream = zipFileEntry.Open();
await responseStream.CopyToAsync(zipEntryStream);
}
}
Upvotes: 3