Thorsten Kettner
Thorsten Kettner

Reputation: 94939

XmlSerializer.Serialize BOM missing

I am using this code to store my class:

FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
serializer.Serialize(stream, myClass);
stream.Close();

This writes a file that I can read alright with XmlSerializer.Deserialize. The generated file, however, is not a proper text file. XmlSerializer.Serialize doesn't store a BOM, but still inserts multibyte characters. Thus it is implicitely declared an ANSI file (because we expect an XML file to be a text file, and a text file without a BOM is considered ANSI by Windows), showing ö as ö in some editors.

Is this a known bug? Or some setting that I'm missing?

Here is what the generated file starts with:

<?xml version="1.0"?>
<SvnProjects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

The first byte in the file is hex 3C, i.e the <.

Upvotes: 2

Views: 1412

Answers (2)

m r
m r

Reputation: 21

  1. you must xml an instance not a class definition
  2. for getting Unicode you must declare a XmlWriter or TextWriter
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
XmlWriter writer = new XmlTextWriter(fs, Encoding.Unicode);
serializer.Serialize(writer, myClass);
stream.Close();

Upvotes: 1

Marc Gravell
Marc Gravell

Reputation: 1062975

Having or not having a BOM is not a definition of a "proper text file". In fact, I'd say that the most typical format these days is UTF-8 without BOM; I don't think I've ever seen anyone actually use the UTF-8 BOM in real systems! But: if you want a BOM, that's fine: just pass the correct Encoding in; if you want UTF-8 with BOM:

using (var writer = XmlWriter.Create(myPath, s_settings))
{
    XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
    serializer.Serialize(writer, obj);
}

with:

static readonly XmlWriterSettings s_settings =
    new XmlWriterSettings { Encoding = new UTF8Encoding(true) };

The result of this is a file that starts EF-BB-BF, the UTF-8 BOM.

If you want a different encoding, then just replace new UTF8Encoding with whatever you did want, remembering to enable the BOM.

(note: the static Encoding.UTF8 instance has the BOM enabled, but IMO it is better to be very explicit here if you specifically intend to use a BOM, just like you should be very explicit about what Encoding you intended to use)


Edit: the key difference here is that Serialize(Stream, object) ends up using:

XmlTextWriter xmlWriter = new XmlTextWriter(stream, encoding: null) {
    Formatting = Formatting.Indented,
    Indentation = 2
};

which then ends up using:

public StreamWriter(Stream stream) : this(stream,
    encoding: UTF8NoBOM, // <==== THIS IS THE PROBLEM
    bufferSize: 1024, leaveOpen: false)
{
}

so: UTF-8 without BOM is the default if you use that API.

Upvotes: 3

Related Questions