jkh
jkh

Reputation: 3678

Reading compressed file and writing to new file will not allow decompression

I have a test program that demonstrates the end result that I am hoping for (even though in this test program the steps may seem unnecessary).

The program compresses data to a file using GZipStream. The resulting compressed file is C:\mydata.dat.

I then read this file, and write it to a new file.

//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(@"C:\mydata.dat"))
{
    compressedFile = reader.ReadToEnd();
    reader.Close();
    reader.Dispose();
}

//Write to a new file
using (StreamWriter file = new StreamWriter(@"C:\mynewdata.dat"))
{
    file.WriteLine(compressedUserFile);
}

When I try to decompress the two files, the original one decompresses perfectly, but the new file throws an InvalidDataException with message The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.

Why are these files different?

Upvotes: 3

Views: 1519

Answers (2)

musefan
musefan

Reputation: 48415

EDIT: Apparently, my suggestions are wrong/invalid/whatever... please use one of the others which have no doubt been highly re-factored to the point where no extra performance could be possible be achieved (else, that would mean they are just as invalid as mine)

using (System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\mydata.dat"))
{
    using (System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\mynewdata.dat"))
    {
        byte[] bytes = new byte[1024];
        int count = 0;
        while((count = sr.BaseStream.Read(bytes, 0, bytes.Length)) > 0){
            sw.BaseStream.Write(bytes, 0, count);
        }
    }
}

Read all bytes

byte[] bytes = null;
using (System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\mydata.dat"))
{
    bytes = new byte[sr.BaseStream.Length];
    int index = 0;
    int count = 0;
    while((count = sr.BaseStream.Read(bytes, index, 1024)) > 0){
        index += count;
    }
}

Read all bytes/write all bytes (from svick's answer):

byte[] bytes = File.ReadAllBytes(@"C:\mydata.dat");
File.WriteAllBytes(@"C:\mynewdata.dat", bytes);

PERFORMANCE TESTING WITH OTHER ANSWERS:

Just did a quick test between my Answer (StreamReader) (first part above, file copy) and svick's answer (FileStream/MemoryStream) (the first one). The test is 1000 iterations of the code, here are the results from 4 tests (results are in whole seconds, all actual result where slightly over these values):

My Code | svick code
--------------------
9       | 12
9       | 14
8       | 13
8       | 14

As you can see, in my test at least, my code performed better. One thing perhaps to note with mine is I am not reading a character stream, I am in fact accessing the BaseStream which is providing a byte stream. Perhaps svick's answer is slow because he is using two streams for reading, then two for writing. Of course, there is a lot of optimisation that could be done to svick's answer to improve the performance (and he also provided an alternative for simple file copy)

Testing with third option (ReadAllBytes/WriteAllBytes)

My Code | svick code | 3rd
----------------------------
8       | 14         | 7
9       | 18         | 9
9       | 17         | 8
9       | 17         | 9

Note: in milliseconds the 3rd option was always better

Upvotes: -1

svick
svick

Reputation: 244848

StreamReader is for reading a sequence of characters, not bytes. The same applies to StremWriter. Since treating compressed files as a stream of characters doesn't make any sense, you should use some implementation of Stream. If you want to get the stream as an array of bytes, you can use MemoryStream.

The exact reason why using character streams doesn't work is that they assume the UTF-8 encoding by default. If some byte is not valid UTF-8 (like the second byte of the header, 0x8B), it's represented as Unicode “replacement character” (U+FFFD). When the string is written back, that character is encoded using UTF-8 into something completely different than what was in the source.

For example, to read a file from a stream, get it as an array of bytes and then write it to another files as a stream:

byte[] bytes;
using (var fileStream = new FileStream(@"C:\mydata.dat", FileMode.Open))
using (var memoryStream = new MemoryStream())
{
    fileStream.CopyTo(memoryStream);
    bytes = memoryStream.ToArray();
}

using (var memoryStream = new MemoryStream(bytes))
using (var fileStream = new FileStream(@"C:\mynewdata.dat", FileMode.Create))
{
    memoryStream.CopyTo(fileStream);
}

The CopyTo() method is only available in .Net 4, but you can write your own if you use older versions.

Of course, for this simple example, there is no need to use streams. You can simply do:

byte[] bytes = File.ReadAllBytes(@"C:\mydata.dat");
File.WriteAllBytes(@"C:\mynewdata.dat", bytes);

Upvotes: 3

Related Questions