Reputation: 8008
I'm trying to use GZipStream to write some application traces (they tend to grow to huge sizes in production). So in this case i need the ability to open an existing file and append to it using GZipStream. All things seem to work well until we try to decompress the file. It seems that on decompression GZipStream reads only the first chunk of data and then behaves like it reached EOF (even if the file contains a whole lot more). Strange thing is that when opening the file using windows or Winrar all the data seems to be present and extracted properly. Has anyone encountered this issue before?
Upvotes: 5
Views: 2621
Reputation: 908
I've encountered same problem. The idea is to implement something like what http://zlib.net/pigz/ does.
The idea is to remove last 8 bytes of old gzip chunk (the footer), extract CRC and size from the footer, then add some zeroes, then append new chunk and then recalculate source size and CRC bazed on old and new chunk's sizes and CRCs, and replace the resulting footer. The problem here is that I didn't find how to make valid sum CRC based on two parts CRCs. Also new chunk needs its header removed first.
What pigz does is also sharing some of dictionary data between the chunks, and it does all the stuff described above, so you may look at the sources.
Upvotes: 2
Reputation: 61103
Reading from a Gzip file with appended content is only an issue in .NET Framework. The workaround is to read the file stream looking for the Gzip magic bytes and open sub streams starting from those offsets. Solution is clearly inefficient but works.
using System.IO;
using System.IO.Compression;
namespace GzipStuff;
public static class GzipFrameworkReader
{
private const byte GzipPreamble1 = 0x1f;
private const byte GzipPreamble2 = 0x8b;
private const byte GzipPreamble3 = 0x08;
public static string ReadFile(string path)
{
int marker = 0;
int b;
using FileStream fs = File.OpenRead(path);
MemoryStream outmem = new();
while ((b = fs.ReadByte()) != -1)
{
if (marker == 0 && (byte)b == GzipPreamble1)
{
marker++;
continue;
}
if (marker == 1)
{
if ((byte)b == GzipPreamble2)
{
marker++;
continue;
}
marker = 0;
}
if (marker == 2)
{
marker = 0;
if ((byte)b == GzipPreamble3)
{
AppendBytes(path, outmem, fs.Position - 3);
}
}
}
outmem.Seek(0, SeekOrigin.Begin);
using StreamReader reader = new(outmem);
return reader.ReadToEnd();
}
private static void AppendBytes(string path, MemoryStream outmem, long pos)
{
using FileStream substream = File.OpenRead(path);
substream.Seek(pos, SeekOrigin.Begin);
using GZipStream gzip = new(substream, CompressionMode.Decompress);
gzip.CopyTo(outmem);
}
}
Upvotes: 0
Reputation: 4943
This took an incredibly long time for me to figure out. The standard C# implementation, GZipStream has a bug in that it does not support concatenated gzip files. It will only decompress the first part of a gzip file created from concatenation, and will report end of stream after that.
Here is an example that will work for concatinated gzip files:
new StreamReader(new ICSharpCode.SharpZipLib.GZip.GZipInputStream(Console.OpenStandardInput()));
You can get the library you need from nuget. I recommend jetbrain's version, JetBrains.SharpZLib.Stripped. Nuget threw an error when I tried to use the other option in a .net core project
Upvotes: 2