rhughes
rhughes

Reputation: 9583

Encoding.UTF8 as Default

I have just written a file using StreamWriter and found that I had to explicitly set the encoding to Encoding.UTF8 for it to write Chinese characters, otherwise it came out as gibberish.

I have two questions:

  1. How do I set the default encoding to Encoding.UTF8 so that I don't have to always set this explicitly?
  2. Why is Encoding.UTF8 or Encoding.Unicode not default for StreamWriter as .NET strings are UTF-16 by default?

Upvotes: 1

Views: 1667

Answers (2)

Hans Passant
Hans Passant

Reputation: 941465

Why is Encoding.UTF8 or Encoding.Unicode not default for StreamWriter

UTF8 in fact is the default for StreamWriter. From the MSDN documentation for the StreamWriter(string) constructor:

This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM), so its GetPreamble method returns an empty byte array. The default UTF-8 encoding for this constructor throws an exception on invalid bytes. This behavior is different from the behavior provided by the encoding object in the Encoding.UTF8 property. To specify a BOM and determine whether an exception is thrown on invalid bytes, use a constructor that accepts an encoding object as a parameter, such as StreamWriter(String, Boolean, Encoding).

So the real problem is with the program that reads your file, it requires the BOM to reliably decode the text in the file. This is not entirely unusual.

Sadly, the StreamWriter class has to follow the Unicode standard, which stipulates that a BOM is optional. There is a lot to admire what the Unicode consortium has done, this decision was frankly not one of them.

You'll have to accommodate the program, and the Unicode standard, trivially solve your problem by using the StreamWriter constructor that takes an Encoding argument and specify Encoding.UTF8

Upvotes: 3

Jon
Jon

Reputation: 437376

I have just written a file using StreamWriter and found that I had to explicitly set the encoding to Encoding.UTF8 for it to write Chinese characters, otherwise it came out as gibberish.

That's not really the fault of StreamWriter; it's just that the producer and the consumer of your data don't agree on the encoding. If I speak English and you speak Portuguese, whose fault is it that we can't talk to each other?

How do I set the default encoding to Encoding.UTF8 so that I don't have to always set this explicitly?

You could subclass StreamWriter and e.g. make a Utf8StreamWriter that sets this property on its own. But then you 'd have to write Utf8StreamWriter everywhere, which is not really different from just setting the encoding.

I recommend to just set the encoding. It's not the end of the world. Alternatively, note that the constructor which wraps a Stream does use UTF-8 as the default encoding.

Why is Encoding.UTF8 (or greater) not default for StreamWriter as .NET strings are UTF-16 by default?

Because the library designers chose not to make it the default. Your code might want to produce UTF-8 output, but mine might want something else. Clearly there is no single choice that would satisfy both of us as the default.

Also, encodings are generally completely unrelated to one another, irrespective of similarities in their names. It doesn't make sense to say "or greater". They are different encodings; they are similar in that they are all able to encode the full set of Unicode characters, but how they encode them is different.

Upvotes: 2

Related Questions