Roman Starkov
Roman Starkov

Reputation: 61382

How to serialize a string holding nothing but "\r\n" to XML correctly?

We're using DataContractSerializer to serialize our data to XML. Recently we found a bug with how the string "\r\n" gets saved and read back - it was turned into just "\n". Apparently, what causes this is using an XmlWriter with Indent = true set:

// public class Test { public string Line; }

var serializer = new DataContractSerializer(typeof(Test));

using (var fs = File.Open("C:/test.xml", FileMode.Create))
using (var wr = XmlWriter.Create(fs, new XmlWriterSettings() { Indent = true }))
    serializer.WriteObject(wr, new Test() { Line = "\r\n" });

Test test;
using (var fs = File.Open("C:/test.xml", FileMode.Open))
    test = (Test) serializer.ReadObject(fs);

The obvious fix is to stop indenting XML, and indeed removing the "XmlWriter.Create" line makes the Line value roundtrip correctly, whether it's "\n", "\r\n" or anything else.

However, the way DataContractSerializer writes it still doesn't seem to be entirely safe or perhaps even correct - for example, just reading the resulting file with XML Notepad and saving it again destroys both "\n" and "\r\n" values completely.

What is the correct approach here? Is using XML as a format for serializing binary data a flawed concept? Are we wrong to expect that tools like XML Notepad won't break our data? Do we need to augment each and every string field that could contain such text with some special attribute, perhaps something to force CDATA?

Upvotes: 3

Views: 1379

Answers (3)

Pony440
Pony440

Reputation: 1

You can try using:

XmlWriterSettings.NewLineHandling = NewLineHandling.Entitize;

Upvotes: -2

Scott Dorman
Scott Dorman

Reputation: 42516

Why is it important to distinguish between a string containing '\r\n' and an empty string? In general, when using data contract serialization you don't care about the XML format/structure or how it stores the data as long as it "round-trips" correctly.

This is how we use it:

DataContractSerializer serializer = CreateSerializer(this.GetType());
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
using (XmlWriter writer = XmlTextWriter.Create(sb, settings))
{
   serializer.WriteObject(writer, this);
   writer.Flush();
}


internal static T Deserialize<T>(Stream stream)
{
   DataContractSerializer serializer = CreateSerializer(typeof(T));
   return (T)serializer.ReadObject(stream);
}

public static DataContractSerializer CreateSerializer(Type type)
{
   DataContractSerializer serializer = new DataContractSerializer();
   return serializer;
}

If I'm not mistaken, characters like linefeeds are not allowable characters within an XML value and would need to be either encoded or contrained in a CDATA section. The data contract serializer does neither of these. Tools like XML Notepad are changing the data because they realize these aren't legal characters and removing them to create conformant XML.

It actually shouldn't be surprising that string data can be returned differently between a binary serializer and an XML serializer. The binary serializer will serialize the exact binary representation of the data bit for bit and has no "rules" on what are legal characters, etc.

Upvotes: 1

Noon Silk
Noon Silk

Reputation: 55072

Potentially you could use a CDATA, but I do agree with your summary that using XML for serialising binary data is just plain wrong. Can you communicate the data another way?

Upvotes: 3

Related Questions