user34537
user34537

Reputation:

Why is deflate making my data BIGGER?

I wanted to compress some data so i thought i'd run the stream by deflate

It went from 304 bytes to 578. Thats 1.9x larger. I was trying to compress it..... What am i doing wrong here?

using (MemoryStream ms2 = new MemoryStream())
using (var ms = new DeflateStream(ms2, CompressionMode.Compress, true))
{
    ms.WriteByte(1);
    ms.WriteShort((short)txtbuf.Length);
    ms.Write(txtbuf, 0, txtbuf.Length);
    ms.WriteShort((short)buf2.Length);
    ms.Write(buf2, 0, buf2.Length);
    ms.WriteShort((short)buf3.Length);
    ms.Write(buf3, 0, buf3.Length);
    ms.Flush();
    result_buf = ms2.ToArray();
}

Upvotes: 4

Views: 1919

Answers (6)

Matthew1471
Matthew1471

Reputation: 248

I don't have the reputation to leave a comment, however the reason why compression performance is worse than you would expect is not due to a bug per se, but apparently a patent one:

The reason for the compression level not being as good as with some other applications is that the most efficient compression algorithms on the market are all patent-protected. .net on the other hand uses a non-patented one.

and

Well, the explanation I got (from someone at MS), when I asked the same thing, was that it had to do with Microsoft not being able to use the GZip algorithm without modifying it; due to patent/licensing issues.

http://social.msdn.microsoft.com/Forums/fr-FR/c5f0b53c-a2d5-4407-b43b-9da8d39c01df/why-do-gzipstream-compression-ratio-so-bad?forum=netfxbcl

Initial I suspected Microsoft’s gzip implementation; I knew that they implemented the Deflate algorithm which isn’t the most effective but is free of patents.

http://challenge-me.ws/post/2010/11/05/Do-Not-Take-Microsofts-Code-for-Granted.aspx

Upvotes: 0

Mark Adler
Mark Adler

Reputation: 112597

The degree to which your data is expanding is a bug in the DeflateStream class. The bug also exists in the GZipStream class. See my description of this problem here: Why does my C# gzip produce a larger file than Fiddler or PHP?.

Do not use the DeflateStream class provided by Microsoft. Use DotNetZip instead, which provides replacement classes.

Incompressible data will expand slightly when you try to compress it, but only by a small amount. The maximum expansion from a properly written deflate compressor is five bytes plus a small fraction of a percent. zlib's expansion of incompressible data (with the default settings for raw deflate) is 5 bytes + 0.03% of the input size. Your 304 bytes, if incompressible, should come out as 309 bytes from a raw deflate compressor like DeflateStream. A factor of 1.9 expansion on something more than five or six bytes in length is a bug.

Upvotes: 5

Mike Bailey
Mike Bailey

Reputation: 12817

You answered your own question in your comment:

i dont know what i changed but the data is randomly made in every run

Random data is hard to compress. In general, when data has many patterns within it (like the text from a dictionary or a website) then it compresses well. But the worse case for a compression algorithm is when you're faced with random data. Truly random data does not have any patterns in it; how then can a compression algorithm expect to be able to compress it?

The next thing to take into account is the fact that certain compression algorithms have overhead in how they store data. They usually have some header bits followed by some symbol data. With random data, it's almost impossible to compress the data into some other form and you end up with tons of header bits interspersed in between your data which serve no purpose other than to say "the following data is represented as such."

Depending on your compression format, the overhead as a percentage of the total file size can either be relatively small or large. In either case though, you will have overhead that will make your new file larger than your old one.

Upvotes: 0

Patrick Hughes
Patrick Hughes

Reputation: 357

Small blocks of data often end up larger because the compression algorithm uses a code table that gets added to the output or it needs a bigger sample to find enough to work with.

You're not doing anything wrong.

Upvotes: 3

darkey
darkey

Reputation: 3722

Shouldn't it be

using (var ms = new DeflateStream(ms2, CompressionMode.Compress, true))

instead of

using (var ms = new DeflateStream(ms, CompressionMode.Compress, true))

If you want to decorate your MemoryStream with a DeflateStream, it should be this way arround.

Upvotes: 2

Wulfram
Wulfram

Reputation: 3382

It's possible that the data you are trying to compress is not actually compressible (or you do not have a lot of data to compress to begin with). Compression works best when there are repetitions in the data.

It's probably bigger because the compression scheme is adding metadata used to decrypt the stream, but because the data is not compressible or there is not a lot of data for compression to take effect, it is actually making it worse.

If you did something like zip a zip file, you would find that decompression does not always make things smaller.

Upvotes: 3

Related Questions