Reputation:
I wanted to compress some data so i thought i'd run the stream by deflate
It went from 304 bytes to 578. Thats 1.9x larger. I was trying to compress it..... What am i doing wrong here?
using (MemoryStream ms2 = new MemoryStream())
using (var ms = new DeflateStream(ms2, CompressionMode.Compress, true))
{
ms.WriteByte(1);
ms.WriteShort((short)txtbuf.Length);
ms.Write(txtbuf, 0, txtbuf.Length);
ms.WriteShort((short)buf2.Length);
ms.Write(buf2, 0, buf2.Length);
ms.WriteShort((short)buf3.Length);
ms.Write(buf3, 0, buf3.Length);
ms.Flush();
result_buf = ms2.ToArray();
}
Upvotes: 4
Views: 1919
Reputation: 248
I don't have the reputation to leave a comment, however the reason why compression performance is worse than you would expect is not due to a bug per se, but apparently a patent one:
The reason for the compression level not being as good as with some other applications is that the most efficient compression algorithms on the market are all patent-protected. .net on the other hand uses a non-patented one.
and
Well, the explanation I got (from someone at MS), when I asked the same thing, was that it had to do with Microsoft not being able to use the GZip algorithm without modifying it; due to patent/licensing issues.
Initial I suspected Microsoft’s gzip implementation; I knew that they implemented the Deflate algorithm which isn’t the most effective but is free of patents.
http://challenge-me.ws/post/2010/11/05/Do-Not-Take-Microsofts-Code-for-Granted.aspx
Upvotes: 0
Reputation: 112597
The degree to which your data is expanding is a bug in the DeflateStream class. The bug also exists in the GZipStream class. See my description of this problem here: Why does my C# gzip produce a larger file than Fiddler or PHP?.
Do not use the DeflateStream class provided by Microsoft. Use DotNetZip instead, which provides replacement classes.
Incompressible data will expand slightly when you try to compress it, but only by a small amount. The maximum expansion from a properly written deflate compressor is five bytes plus a small fraction of a percent. zlib's expansion of incompressible data (with the default settings for raw deflate) is 5 bytes + 0.03% of the input size. Your 304 bytes, if incompressible, should come out as 309 bytes from a raw deflate compressor like DeflateStream. A factor of 1.9 expansion on something more than five or six bytes in length is a bug.
Upvotes: 5
Reputation: 12817
You answered your own question in your comment:
i dont know what i changed but the data is randomly made in every run
Random data is hard to compress. In general, when data has many patterns within it (like the text from a dictionary or a website) then it compresses well. But the worse case for a compression algorithm is when you're faced with random data. Truly random data does not have any patterns in it; how then can a compression algorithm expect to be able to compress it?
The next thing to take into account is the fact that certain compression algorithms have overhead in how they store data. They usually have some header bits followed by some symbol data. With random data, it's almost impossible to compress the data into some other form and you end up with tons of header bits interspersed in between your data which serve no purpose other than to say "the following data is represented as such."
Depending on your compression format, the overhead as a percentage of the total file size can either be relatively small or large. In either case though, you will have overhead that will make your new file larger than your old one.
Upvotes: 0
Reputation: 357
Small blocks of data often end up larger because the compression algorithm uses a code table that gets added to the output or it needs a bigger sample to find enough to work with.
You're not doing anything wrong.
Upvotes: 3
Reputation: 3722
Shouldn't it be
using (var ms = new DeflateStream(ms2, CompressionMode.Compress, true))
instead of
using (var ms = new DeflateStream(ms, CompressionMode.Compress, true))
If you want to decorate your MemoryStream with a DeflateStream, it should be this way arround.
Upvotes: 2
Reputation: 3382
It's possible that the data you are trying to compress is not actually compressible (or you do not have a lot of data to compress to begin with). Compression works best when there are repetitions in the data.
It's probably bigger because the compression scheme is adding metadata used to decrypt the stream, but because the data is not compressible or there is not a lot of data for compression to take effect, it is actually making it worse.
If you did something like zip a zip file, you would find that decompression does not always make things smaller.
Upvotes: 3