Reputation: 3265
I am writing functionality for our web server which should download several files from other servers, and return them as a zip archive without compression.
How can I determine the final size of the ZIP archive if I know the sizes of all downloaded files?
This is the code which I am working on for the moment. The commented line caused corruption of the ZIP archive.
public void Download()
{
var urls = Request.Headers["URLS"].Split(';');
Task<WebResponse>[] responseTasks = urls
.Select(it =>
{
var request = WebRequest.Create(it);
return Task.Factory.FromAsync<WebResponse>(request.BeginGetResponse(null, null), request.EndGetResponse);
})
.ToArray();
Task.WaitAll(responseTasks);
var webResponses = responseTasks.Where(it => it.Exception == null).Select(it => it.Result);
var totalSize = webResponses.Sum(it => it.ContentLength + 32);
Response.ContentType = "application/zip";
Response.CacheControl = "Private";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
// Response.AddHeader("Content-Length", totalSize.ToString(CultureInfo.InvariantCulture));
var sortedResponses = webResponses.OrderBy(it => it.ContentLength);
var buffer = new byte[32 * 1024];
using (var zipOutput = new ZipOutputStream(Response.OutputStream))
{
zipOutput.SetLevel(0);
foreach (var response in sortedResponses)
{
var dataStream = response.GetResponseStream();
var ze = new ZipEntry(Guid.NewGuid().ToString() + ".jpg");
zipOutput.PutNextEntry(ze);
int read;
while ((read = dataStream.Read(buffer, 0, buffer.Length)) > 0)
{
zipOutput.Write(buffer, 0, read);
Response.Flush();
}
if (!Response.IsClientConnected)
{
break;
}
}
zipOutput.Finish();
}
Response.Flush();
Response.End();
}
Upvotes: 4
Views: 4892
Reputation: 1590
I had the same problem and ended up creating a fake Archive and track the size.
This has the advantage, that it should work with any implementation (like the one from System.IO.Compression which has many branches depending on the file name encoding or file size).
The important part is using Stream.Null
instead of a MemoryStream
, so no memory is used for the calculation.
public long Size(FileItem[] files)
{
using (var ms = new PositionWrapperStream(Stream.Null))
{
using (var archive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
foreach (var file in files)
{
var entry = archive.CreateEntry(file.Name, CompressionLevel.NoCompression);
using (var zipStream = entry.Open())
{
WriteZero(zipStream, file.Length);//the actual content does not matter
}
}
}
return ms.Position;
}
}
private void WriteZero(Stream target, long count)
{
byte[] buffer = new byte[1024];
while (count > 0)
{
target.Write(buffer, 0, (int) Math.Min(buffer.Length, count));
count -= buffer.Length;
}
}
The PositionWrapperStream is a simple Wrapper, which just tracks the position:
class PositionWrapperStream : Stream
{
private readonly Stream wrapped;
private int pos = 0;
public PositionWrapperStream(Stream wrapped)
{
this.wrapped = wrapped;
}
public override bool CanSeek { get { return false; } }
public override bool CanWrite { get { return true; } }
public override long Position
{
get { return pos; }
set { throw new NotSupportedException(); }
}
public override void Write(byte[] buffer, int offset, int count)
{
pos += count;
wrapped.Write(buffer, offset, count);
}
//...other methods with throw new NotSupportedException();
}
Upvotes: 1
Reputation: 873
I had the same problem and reading the ZIP-spec came up with the following solution:
zip_size = num_of_files * (30 + 16 + 46) + 2 * total_length_of_filenames + total_size_of_files + 22
with:
Local file header
Data descriptor
Central directory file header
End of central directory record (EOCD)
This however does not account for comments on files and the zip in total. The compression is store (level 0).
This works for the ZIP-implementation i've written. As nickolay-olshevsky pointed out, other compressors might
do things a little different.
Upvotes: 8
Reputation: 14160
ZIP file is composed of some per-file records, plus some per-archive records. They have complicated structure, and can differ in size, depending on archiver used. However, if you will use the same implementation with the same compression option, your archive size will depend only on size of input, and size of input file names.
So you can make archive with 1 and 2 files, and, knowing their sizes, plus input file sizes, plus file name sizes, calculate per-archive payload size, per-file payload size, plus dependency of archive size from the file name (file name is used in two places).
Upvotes: 2