Reputation: 144
I am tring to compress an ascii string (base64) using GZip, however, it producing more data instead of less data. Can anyone help?
It is an old project and I'm limited to the compilers and Framework versions. I have tried MSBuild 2.0, 3.5 & 4.0 - all produce the same erroneous results.
Imports System.IO.Compression
Private Function GZipString(ByVal asciiString As String) as Byte()
Debug.Print ("asciiString length : {0}", asciiString.Length )
Dim asciibytes As Byte() = Encoding.ASCII.GetBytes(asciiString)
Debug.Print ("asciibytes length : {0}", asciibytes.Length )
'GZip the string
Dim ms As New MemoryStream()
Dim gzips As New GZipStream(ms, CompressionMode.Compress)
gzips.Write(asciibytes, 0, asciibytes.Length)
gzips.Close()
GZipString = ms.ToArray
ms.Close()
Debug.Print ("compressedBytes length : {0}", GZipString.Length )
End Function
The output I am getting is:-
Upvotes: 1
Views: 2188
Reputation: 6412
1.) This is 2019, use UTF8
, not ASCII
; UTF8
is 99% compatible, with the added support of international characters (kanji, emojis, anything you can type). Most web browsers and servers even default to the UTF8
encoding format now (they have for years). You should generally avoid ASCII
unless you are working with legacy software that requires it... (and even then, only for the part of the code that talks to that legacy system!)
2.) For reference, compressing a string won't always result in less bytes; especially with small strings; a 3600 character string though should definitely compress (unless it contains completely random garbage, a human typed plain text string of that length should definitely compress).
3.) You should be properly disposing of objects (via using
statements). Not doing so can lead to resource and/or memory leakage.
4.) Either your compression or your decompression code is wrong; and GZipStream
can be super finicky; so, I've included tested code that works for C#
and VB.NET
below.
static void Main(string[] args)
{
var input = string.Join(" ", args);
var compressedBytes = CompressString(input);
var dec = DecompressString(compressedBytes);
Console.WriteLine("Input Length = " + input.Length); // 537
Console.WriteLine("Uncompressed Size = " + Encoding.UTF8.GetBytes(input).Length); // 539
Console.WriteLine("Compressed Size = " + compressedBytes.Length); // 354 (smaller!)
Console.WriteLine("Decompressed Length = " + dec.Length); // 537 (same size!)
Console.WriteLine("Roundtrip Successful: " + (input == dec)); // True
}
public static string DecompressString(byte[] bytes)
{
using (var ms = new MemoryStream(bytes))
using (var ds = new GZipStream(ms, CompressionMode.Decompress))
using (var sr = new StreamReader(ds))
{
return sr.ReadToEnd();
}
}
public static byte[] CompressString(string input)
{
using (var ms = new MemoryStream())
using (var cs = new GZipStream(ms, CompressionLevel.Optimal))
{
var bytes = Encoding.UTF8.GetBytes(input);
cs.Write(bytes, 0, bytes.Length);
// *REQUIRED* or last chunk will be omitted. Do NOT call any other close or
// flush method.
cs.Close();
return ms.ToArray();
}
}
(gross, I feel dirty 😜):
Sub Main(args As String())
Dim input As String = String.Join(" ", args)
Dim compressedBytes As Byte() = CompressString(input)
Dim dec As String = DecompressString(compressedBytes)
Console.WriteLine("Input Length = " & input.Length) ' 537
Console.WriteLine("Uncompressed Size = " & Encoding.UTF8.GetBytes(input).Length) ' 539
Console.WriteLine("Compressed Size = " & compressedBytes.Length) ' 354 (smaller!)
Console.WriteLine("Decompressed Length = " & dec.Length) ' 537 (same size!)
Console.WriteLine("Roundtrip Successful: " & (input = dec).ToString()) ' True
End Sub
Public Function DecompressString(ByVal bytes As Byte()) As String
Using ms = New MemoryStream(bytes)
Using ds = New GZipStream(ms, CompressionMode.Decompress)
Using sr = New StreamReader(ds)
Return sr.ReadToEnd()
End Using
End Using
End Using
End Function
Public Function CompressString(input As String) As Byte()
Using ms = New MemoryStream
Using cs = New GZipStream(ms, CompressionLevel.Optimal)
Dim bytes As Byte() = Encoding.UTF8.GetBytes(input)
cs.Write(bytes, 0, bytes.Length)
' *REQUIRED* Or last chunk will be omitted. Do Not call any other close Or
' flush method.
cs.Close()
Return ms.ToArray()
End Using
End Using
End Function
For .NET 3.5, this still works (and produces a smaller object; though not as small as 4.8, it only compresses down to 497 bytes instead of 354 bytes with my sample data).
You just need to change CompressionLevel.Optimal
to CompressionMode.Compress
.
Upvotes: 3