dekajoo
dekajoo

Reputation: 2102

compression and utf8 encoding

can someone tell me why I'm loosing information doing this process ? Some utf8 chars appears not decoded : "Biography":"\u003clink type=... or Steve Blunt \u0026 Marty Kelley but others do : "Name":"朱敬

// Creating a 64bit string containing gzip data
string bar;
using (MemoryStream ms = new MemoryStream())
{
    using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress))
    using (StreamWriter writer = new StreamWriter(gzip, System.Text.Encoding.UTF8))
    {
        writer.Write(s);
    }
    ms.Flush();
    bar = Convert.ToBase64String(ms.ToArray());
}

// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (MemoryStream dest = new MemoryStream(itemData.Length*2))
{
    gzs.CopyTo(dest);
    foo = Encoding.UTF8.GetString(dest.ToArray());
}

Console.WriteLine(foo);

Upvotes: 7

Views: 5974

Answers (2)

dekajoo
dekajoo

Reputation: 2102

The issue was simply that the characters were already encoded in the source string.

Ps : Credit goes to rik for this answer :)

Edit : I also had the StreamReader issue matthew-watson was suggesting.

Upvotes: 0

Matthew Watson
Matthew Watson

Reputation: 109537

It could be because you are writing the string using StreamWriter but reading it using CopyTo() and Encoding.GetString().

What happens if you try this?

// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (StreamReader reader = new StreamReader(gzs, Encoding.UTF8))
{
    foo = reader.ReadLine();
}

Although I think you should be using BinaryReader and BinaryWriter:

string s = "Biography:\u003clink type...";
string bar;
using (MemoryStream ms = new MemoryStream())
{
    using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress))
    using (var writer = new BinaryWriter(gzip, Encoding.UTF8))
    {
        writer.Write(s);
    }
    ms.Flush();
    bar = Convert.ToBase64String(ms.ToArray());
}

// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (var reader = new BinaryReader(gzs, Encoding.UTF8))
{
    foo = reader.ReadString();
}

Console.WriteLine(foo);

Upvotes: 4

Related Questions