xyz
xyz

Reputation: 27827

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

The snippet says it all :-)

UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/);
byte[] data = enc.GetBytes("a");
// data has length 1.
// I expected the BOM to be included. What's up?

Upvotes: 13

Views: 2985

Answers (4)

MSalters
MSalters

Reputation: 179779

Note that in general, you don't need the Byte Order Mark for UTF-8 anyway. It's main purpose is to tell UTF16 BE and UTF16 LE apart. There is no such thing as UTF8 LE and UTF8 BE.

Upvotes: 2

xyz
xyz

Reputation: 27827

Thank you both. The following works, and LINQ makes the combination simple :-)

UTF8Encoding enc = new UTF8Encoding(true);
byte[] data = enc.GetBytes("a");
byte[] combo = enc.GetPreamble().Concat(data).ToArray();

Upvotes: 9

Marc Gravell
Marc Gravell

Reputation: 1062492

Because it is expected that GetBytes() will be called lots of times... you need to use:

byte[] preamble = enc.GetPreamble();

(only call it at the start of a sequence) and write that; this is where the BOM lives.

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1499770

You wouldn't want it to be used for every call to GetBytes, otherwise you'd have no way of (say) writing a file a line at a time.

By exposing it with GetPreamble, callers can insert the preamble just at the appropriate point (i.e. at the start of their data). I agree that the documentation could be a lot clearer though.

Upvotes: 18

Related Questions