James Parsons
James Parsons

Reputation: 6057

Writing a large amount of UTF8 bytes to a file results in massive bloat

So I was recently playing around, and attempted to generate a 1GB file.

StreamWriter writer = new StreamWriter(@"C:\Users\parsonsj\Desktop\data.dat");
Encoding utf8enc = UTF8Encoding.UTF8;

for (int i = 0; i < 1073741824; i++) {
    writer.Write(utf8enc.GetBytes("#"));
}
writer.Close();

My thoughts were that since UTF8 characters were 1 byte and 1GB is about 1,073,741,824 bytes, writing a single UTF8 character 1,073,741,824 times would result in an approximately 1GB file size.

I ran my little program and as expected, it started slowing things down and eating memory. I ended up prematurely killing it, and went to check the file size, curious how far I got. To my horror, the file was a whopping 13GB.

I'm not sure how it got so big. Perhaps I'm encoding it wrong. Perhaps there was some sort of crazy memory-leak related bug. I'm just confused.

Why was my file so big? Am I misunderstanding the encoding or the math?

Upvotes: 0

Views: 146

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726549

This is because writer.Write does not have an overload for byte[] array. The compiler thinks that you are trying to call the overload taking System.Object, and so the array gets written to the stream as "System.Byte[]".

Fix this by using FileStream's Write(Byte\[\], int, int) method.

Upvotes: 7

Related Questions