Xaphann
Xaphann

Reputation: 3677

.NET 6 failing at Decompress large gzip text

I have to decompress some gzip text in .NET 6 app, however, on a string that is 20,627 characters long, it only decompresses about 1/3 of it. The code I am using code works for this string in .NET 5 or .NETCore 3.1 As well as smaller compressed strings.

public static string Decompress(this string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);
    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
    var buffer = new byte[dataLength];
    memoryStream.Position = 0;
    using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
    {
        gZipStream.Read(buffer, 0, buffer.Length);
    }
    return Encoding.UTF8.GetString(buffer);
}

The results look something like this:

Star of amazing text..... ...Text is fine till 33,619 after that is allNULLNULLNULLNULL

The rest of the file after the 33,618 characters is just nulls.

I have no idea why this is happening.

Edit: I updated this when I found the issue was not Blazor but in fact .NET 6. I took a project that was working in .NET Core 3.1 changed nothing other than compiling for .NET 6 and got the same error. The update reflects this.

Edit2: Just tested and it works in .NET 5 so it just .NET 6 that this error happens in.

Upvotes: 17

Views: 4040

Answers (1)

Wiktor Zychla
Wiktor Zychla

Reputation: 48240

Just confirmed that the article linked in the comments below the question contains a valid clue on the issue.

Corrected code would be:

string Decompress(string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);

    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

    var buffer = new byte[dataLength];
    memoryStream.Position = 0;

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress);

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

    return Encoding.UTF8.GetString(buffer);
}

This approach changes

gZipStream.Read(buffer, 0, buffer.Length);

to

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

which takes the Read's return value into account correctly.

Without the change, the issue is easily repeatable on any string random enough to produce a gzip of length > ~10kb.

Here's the compressor, if anyone's interested in testing this on your own

string Compress(string plainText)
{
    var buffer = Encoding.UTF8.GetBytes(plainText);
    using var memoryStream = new MemoryStream();

    var lengthBytes = BitConverter.GetBytes((int)buffer.Length);
    memoryStream.Write(lengthBytes, 0, lengthBytes.Length);

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress);
    
    gZipStream.Write(buffer, 0, buffer.Length);
    gZipStream.Flush();

    var gZipBuffer = memoryStream.ToArray();

    return Convert.ToBase64String(gZipBuffer);
}

Upvotes: 26

Related Questions