Reputation: 94939
I am using this code to store my class:
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
serializer.Serialize(stream, myClass);
stream.Close();
This writes a file that I can read alright with XmlSerializer.Deserialize
. The generated file, however, is not a proper text file. XmlSerializer.Serialize
doesn't store a BOM, but still inserts multibyte characters. Thus it is implicitely declared an ANSI file (because we expect an XML file to be a text file, and a text file without a BOM is considered ANSI by Windows), showing ö as ö in some editors.
Is this a known bug? Or some setting that I'm missing?
Here is what the generated file starts with:
<?xml version="1.0"?>
<SvnProjects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
The first byte in the file is hex 3C, i.e the <
.
Upvotes: 2
Views: 1412
Reputation: 21
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
XmlWriter writer = new XmlTextWriter(fs, Encoding.Unicode);
serializer.Serialize(writer, myClass);
stream.Close();
Upvotes: 1
Reputation: 1062975
Having or not having a BOM is not a definition of a "proper text file". In fact, I'd say that the most typical format these days is UTF-8 without BOM; I don't think I've ever seen anyone actually use the UTF-8 BOM in real systems! But: if you want a BOM, that's fine: just pass the correct Encoding
in; if you want UTF-8 with BOM:
using (var writer = XmlWriter.Create(myPath, s_settings))
{
XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
serializer.Serialize(writer, obj);
}
with:
static readonly XmlWriterSettings s_settings =
new XmlWriterSettings { Encoding = new UTF8Encoding(true) };
The result of this is a file that starts EF-BB-BF, the UTF-8 BOM.
If you want a different encoding, then just replace new UTF8Encoding
with whatever you did want, remembering to enable the BOM.
(note: the static Encoding.UTF8
instance has the BOM enabled, but IMO it is better to be very explicit here if you specifically intend to use a BOM, just like you should be very explicit about what Encoding
you intended to use)
Edit: the key difference here is that Serialize(Stream, object)
ends up using:
XmlTextWriter xmlWriter = new XmlTextWriter(stream, encoding: null) {
Formatting = Formatting.Indented,
Indentation = 2
};
which then ends up using:
public StreamWriter(Stream stream) : this(stream,
encoding: UTF8NoBOM, // <==== THIS IS THE PROBLEM
bufferSize: 1024, leaveOpen: false)
{
}
so: UTF-8 without BOM is the default if you use that API.
Upvotes: 3