v00d00
v00d00

Reputation: 3265

Calculate size of zip file with compression level 0

I am writing functionality for our web server which should download several files from other servers, and return them as a zip archive without compression.

How can I determine the final size of the ZIP archive if I know the sizes of all downloaded files?

This is the code which I am working on for the moment. The commented line caused corruption of the ZIP archive.

public void Download()
{
    var urls = Request.Headers["URLS"].Split(';');
    Task<WebResponse>[] responseTasks = urls
        .Select(it =>
        {
            var request = WebRequest.Create(it);
            return Task.Factory.FromAsync<WebResponse>(request.BeginGetResponse(null, null), request.EndGetResponse);
        })
        .ToArray();

    Task.WaitAll(responseTasks);

    var webResponses = responseTasks.Where(it => it.Exception == null).Select(it => it.Result);

    var totalSize = webResponses.Sum(it => it.ContentLength + 32);

    Response.ContentType = "application/zip";
    Response.CacheControl = "Private";
    Response.Cache.SetCacheability(HttpCacheability.NoCache);
    // Response.AddHeader("Content-Length", totalSize.ToString(CultureInfo.InvariantCulture));

    var sortedResponses = webResponses.OrderBy(it => it.ContentLength);

    var buffer = new byte[32 * 1024];

    using (var zipOutput = new ZipOutputStream(Response.OutputStream))
    {
        zipOutput.SetLevel(0);

        foreach (var response in sortedResponses)
        {
            var dataStream = response.GetResponseStream();

            var ze = new ZipEntry(Guid.NewGuid().ToString() + ".jpg");
            zipOutput.PutNextEntry(ze);

            int read;
            while ((read = dataStream.Read(buffer, 0, buffer.Length)) > 0)
            {
                zipOutput.Write(buffer, 0, read);
                Response.Flush();
            }

            if (!Response.IsClientConnected)
            {
                break;
            }
        }

        zipOutput.Finish();
    }

    Response.Flush();
    Response.End();
}

Upvotes: 4

Views: 4892

Answers (3)

binco
binco

Reputation: 1590

I had the same problem and ended up creating a fake Archive and track the size.

This has the advantage, that it should work with any implementation (like the one from System.IO.Compression which has many branches depending on the file name encoding or file size).

The important part is using Stream.Null instead of a MemoryStream, so no memory is used for the calculation.

public long Size(FileItem[] files)
{
    using (var ms = new PositionWrapperStream(Stream.Null))
    {
        using (var archive = new ZipArchive(ms, ZipArchiveMode.Create, true))
        {
            foreach (var file in files)
            {
                var entry = archive.CreateEntry(file.Name, CompressionLevel.NoCompression);
                using (var zipStream = entry.Open())
                {
                    WriteZero(zipStream, file.Length);//the actual content does not matter
                }
            }
        }
        return ms.Position;
    }
}

private void WriteZero(Stream target, long count)
{
    byte[] buffer = new byte[1024];
    while (count > 0)
    {
        target.Write(buffer, 0, (int) Math.Min(buffer.Length, count));
        count -= buffer.Length;
    }
}

The PositionWrapperStream is a simple Wrapper, which just tracks the position:

class PositionWrapperStream : Stream
{
    private readonly Stream wrapped;

    private int pos = 0;

    public PositionWrapperStream(Stream wrapped)
    {
        this.wrapped = wrapped;
    }

    public override bool CanSeek { get { return false; } }

    public override bool CanWrite { get { return true; } }

    public override long Position
    {
        get { return pos; }
        set { throw new NotSupportedException(); }
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        pos += count;
        wrapped.Write(buffer, offset, count);
    }

    //...other methods with throw new NotSupportedException(); 
}

Upvotes: 1

derflocki
derflocki

Reputation: 873

I had the same problem and reading the ZIP-spec came up with the following solution:

zip_size = num_of_files * (30 + 16 + 46) + 2 * total_length_of_filenames + total_size_of_files + 22

with:

  • 30: Fixed part of the Local file header
  • 16: Optional: Size of the Data descriptor
  • 46: Fixed part of the Central directory file header
  • 22: Fixed part of the End of central directory record (EOCD)

This however does not account for comments on files and the zip in total. The compression is store (level 0).

This works for the ZIP-implementation i've written. As nickolay-olshevsky pointed out, other compressors might do things a little different.

Upvotes: 8

Nickolay Olshevsky
Nickolay Olshevsky

Reputation: 14160

ZIP file is composed of some per-file records, plus some per-archive records. They have complicated structure, and can differ in size, depending on archiver used. However, if you will use the same implementation with the same compression option, your archive size will depend only on size of input, and size of input file names.

So you can make archive with 1 and 2 files, and, knowing their sizes, plus input file sizes, plus file name sizes, calculate per-archive payload size, per-file payload size, plus dependency of archive size from the file name (file name is used in two places).

Upvotes: 2

Related Questions