Adrian Zanescu
Adrian Zanescu

Reputation: 8008

Appending to a compressed file using GZipStream

I'm trying to use GZipStream to write some application traces (they tend to grow to huge sizes in production). So in this case i need the ability to open an existing file and append to it using GZipStream. All things seem to work well until we try to decompress the file. It seems that on decompression GZipStream reads only the first chunk of data and then behaves like it reached EOF (even if the file contains a whole lot more). Strange thing is that when opening the file using windows or Winrar all the data seems to be present and extracted properly. Has anyone encountered this issue before?

Upvotes: 5

Views: 2621

Answers (3)

Igor Be
Igor Be

Reputation: 908

I've encountered same problem. The idea is to implement something like what http://zlib.net/pigz/ does.

The idea is to remove last 8 bytes of old gzip chunk (the footer), extract CRC and size from the footer, then add some zeroes, then append new chunk and then recalculate source size and CRC bazed on old and new chunk's sizes and CRCs, and replace the resulting footer. The problem here is that I didn't find how to make valid sum CRC based on two parts CRCs. Also new chunk needs its header removed first.

What pigz does is also sharing some of dictionary data between the chunks, and it does all the stuff described above, so you may look at the sources.

Upvotes: 2

Santiago Squarzon
Santiago Squarzon

Reputation: 61103

Reading from a Gzip file with appended content is only an issue in .NET Framework. The workaround is to read the file stream looking for the Gzip magic bytes and open sub streams starting from those offsets. Solution is clearly inefficient but works.

using System.IO;
using System.IO.Compression;

namespace GzipStuff;

public static class GzipFrameworkReader
{
    private const byte GzipPreamble1 = 0x1f;

    private const byte GzipPreamble2 = 0x8b;

    private const byte GzipPreamble3 = 0x08;

    public static string ReadFile(string path)
    {
        int marker = 0;
        int b;
        using FileStream fs = File.OpenRead(path);
        MemoryStream outmem = new();

        while ((b = fs.ReadByte()) != -1)
        {
            if (marker == 0 && (byte)b == GzipPreamble1)
            {
                marker++;
                continue;
            }

            if (marker == 1)
            {
                if ((byte)b == GzipPreamble2)
                {
                    marker++;
                    continue;
                }

                marker = 0;
            }

            if (marker == 2)
            {
                marker = 0;

                if ((byte)b == GzipPreamble3)
                {
                    AppendBytes(path, outmem, fs.Position - 3);
                }
            }
        }

        outmem.Seek(0, SeekOrigin.Begin);
        using StreamReader reader = new(outmem);
        return reader.ReadToEnd();
    }

    private static void AppendBytes(string path, MemoryStream outmem, long pos)
    {
        using FileStream substream = File.OpenRead(path);
        substream.Seek(pos, SeekOrigin.Begin);
        using GZipStream gzip = new(substream, CompressionMode.Decompress);
        gzip.CopyTo(outmem);
    }
}

Upvotes: 0

Jack Davidson
Jack Davidson

Reputation: 4943

This took an incredibly long time for me to figure out. The standard C# implementation, GZipStream has a bug in that it does not support concatenated gzip files. It will only decompress the first part of a gzip file created from concatenation, and will report end of stream after that.

Here is an example that will work for concatinated gzip files:

new StreamReader(new ICSharpCode.SharpZipLib.GZip.GZipInputStream(Console.OpenStandardInput()));

You can get the library you need from nuget. I recommend jetbrain's version, JetBrains.SharpZLib.Stripped. Nuget threw an error when I tried to use the other option in a .net core project

Upvotes: 2

Related Questions