George2
George2

Reputation: 45771

String escape into XML

Is there any C# function which could be used to escape and un-escape a string, which could be used to fill in the content of an XML element?

I am using VSTS 2008 + C# + .Net 3.0.

EDIT 1: I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand, for example, I need to put a<b into <foo></foo>, so I need escape string a<b and put it into element foo.

Upvotes: 108

Views: 127005

Answers (11)

Ramazan Binarbasi
Ramazan Binarbasi

Reputation: 787

Following functions will do the work. Didn't test against XmlDocument, but I guess this is much faster.

using System.IO;
using System.Xml;

public static string XmlEncode(string value)
{
    var settings = new XmlWriterSettings 
    {
        ConformanceLevel = ConformanceLevel.Fragment
    };

    var builder = new StringBuilder();

    using (var writer = XmlWriter.Create(builder, settings))
        writer.WriteString(value);

    return builder.ToString();
}

public static string XmlDecode(string xmlEncodedValue)
{
    if (xmlEncodedValue.Length == 0)
        return xmlEncodedValue;

    var settings = new XmlReaderSettings
    {
        ConformanceLevel = ConformanceLevel.Fragment
    };

    using var stringReader = new StringReader(xmlEncodedValue);
    using var xmlReader = XmlReader.Create(stringReader, settings);
    if (!xmlReader.Read() || xmlReader.NodeType != XmlNodeType.Text)
    {
        throw new ArgumentException(
            "The specified value does not constitute an XML-encoded string.",
            nameof(xmlEncodedValue));
    }

    return xmlReader.Value;
}

Upvotes: 4

AllmanTool
AllmanTool

Reputation: 1514

SecurityElementEscape does this job for you

Use this method to replace invalid characters in a string before using the string in a SecurityElement. If invalid characters are used in a SecurityElement without being escaped, an ArgumentException is thrown.

The following table shows the invalid XML characters and their escaped equivalents.

enter image description here

https://learn.microsoft.com/en-us/dotnet/api/system.security.securityelement.escape?view=net-5.0

Upvotes: 2

abberdeen
abberdeen

Reputation: 346

Using a third-party library (Newtonsoft.Json) as alternative:

public static string XmlEscape(string unescaped)
{
    if (unescaped == null) return null;
    return JsonConvert.SerializeObject(unescaped); ;
}

public static string XmlUnescape(string escaped)
{
    if (escaped == null) return null;
    return JsonConvert.DeserializeObject(escaped, typeof(string)).ToString();
}

Examples of escaped string:

a<b ==> "a&lt;b"

<foo></foo> ==> "foo&gt;&lt;/foo&gt;"

NOTE: In newer versions, the code written above may not work with escaping, so you need to specify how the strings will be escaped:

public static string XmlEscape(string unescaped)
{
    if (unescaped == null) return null;
    return JsonConvert.SerializeObject(unescaped, new JsonSerializerSettings()
    {
        StringEscapeHandling = StringEscapeHandling.EscapeHtml
    });
}

Examples of escaped string:

a<b ==> "a\u003cb"

<foo></foo> ==> "\u003cfoo\u003e\u003c/foo\u003e"

Upvotes: 2

Rick Strahl
Rick Strahl

Reputation: 17651

Another take based on John Skeet's answer that doesn't return the tags:

void Main()
{
    XmlString("Brackets & stuff <> and \"quotes\"").Dump();
}

public string XmlString(string text)
{
    return new XElement("t", text).LastNode.ToString();
} 

This returns just the value passed in, in XML encoded format:

Brackets &amp; stuff &lt;&gt; and "quotes"

Upvotes: 7

Stefan Steiger
Stefan Steiger

Reputation: 82186

WARNING: Necromancing

Still Darin Dimitrov's answer + System.Security.SecurityElement.Escape(string s) isn't complete.

In XML 1.1, the simplest and safest way is to just encode EVERYTHING.
Like &#09; for \t.
It isn't supported at all in XML 1.0.
For XML 1.0, one possible workaround is to base-64 encode the text containing the character(s).

//string EncodedXml = SpecialXmlEscape("привет мир");
//Console.WriteLine(EncodedXml);
//string DecodedXml = XmlUnescape(EncodedXml);
//Console.WriteLine(DecodedXml);
public static string SpecialXmlEscape(string input)
{
    //string content = System.Xml.XmlConvert.EncodeName("\t");
    //string content = System.Security.SecurityElement.Escape("\t");
    //string strDelimiter = System.Web.HttpUtility.HtmlEncode("\t"); // XmlEscape("\t"); //XmlDecode("&#09;");
    //strDelimiter = XmlUnescape("&#59;");
    //Console.WriteLine(strDelimiter);
    //Console.WriteLine(string.Format("&#{0};", (int)';'));
    //Console.WriteLine(System.Text.Encoding.ASCII.HeaderName);
    //Console.WriteLine(System.Text.Encoding.UTF8.HeaderName);


    string strXmlText = "";

    if (string.IsNullOrEmpty(input))
        return input;


    System.Text.StringBuilder sb = new StringBuilder();

    for (int i = 0; i < input.Length; ++i)
    {
        sb.AppendFormat("&#{0};", (int)input[i]);
    }

    strXmlText = sb.ToString();
    sb.Clear();
    sb = null;

    return strXmlText;
} // End Function SpecialXmlEscape

XML 1.0:

public static string Base64Encode(string plainText)
{
    var plainTextBytes = System.Text.Encoding.UTF8.GetBytes(plainText);
    return System.Convert.ToBase64String(plainTextBytes);
}

public static string Base64Decode(string base64EncodedData)
{
    var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
    return System.Text.Encoding.UTF8.GetString(base64EncodedBytes);
}

Upvotes: 5

CharlieBrown
CharlieBrown

Reputation: 4163

And if you want, like me when I found this question, to escape XML node names, like for example when reading from an XML serialization, use the easiest way:

XmlConvert.EncodeName(string nameToEscape)

It will also escape spaces and any non-valid characters for XML elements.

http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28VS.80%29.aspx

Upvotes: 10

Darin Dimitrov
Darin Dimitrov

Reputation: 1038790

public static string XmlEscape(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    XmlNode node = doc.CreateElement("root");
    node.InnerText = unescaped;
    return node.InnerXml;
}

public static string XmlUnescape(string escaped)
{
    XmlDocument doc = new XmlDocument();
    XmlNode node = doc.CreateElement("root");
    node.InnerXml = escaped;
    return node.InnerText;
}

Upvotes: 84

Keith Robertson
Keith Robertson

Reputation: 841

Thanks to @sehe for the one-line escape:

var escaped = new System.Xml.Linq.XText(unescaped).ToString();

I add to it the one-line un-escape:

var unescapedAgain = System.Xml.XmlReader.Create(new StringReader("<r>" + escaped + "</r>")).ReadElementString();

Upvotes: 32

TWA
TWA

Reputation: 12816

SecurityElement.Escape(string s)

Upvotes: 146

John Saunders
John Saunders

Reputation: 161773

George, it's simple. Always use the XML APIs to handle XML. They do all the escaping and unescaping for you.

Never create XML by appending strings.

Upvotes: 12

Jon Skeet
Jon Skeet

Reputation: 1500514

EDIT: You say "I am concatenating simple and short XML file and I do not use serialization, so I need to explicitly escape XML character by hand".

I would strongly advise you not to do it by hand. Use the XML APIs to do it all for you - read in the original files, merge the two into a single document however you need to (you probably want to use XmlDocument.ImportNode), and then write it out again. You don't want to write your own XML parsers/formatters. Serialization is somewhat irrelevant here.

If you can give us a short but complete example of exactly what you're trying to do, we can probably help you to avoid having to worry about escaping in the first place.


Original answer

It's not entirely clear what you mean, but normally XML APIs do this for you. You set the text in a node, and it will automatically escape anything it needs to. For example:

LINQ to XML example:

using System;
using System.Xml.Linq;

class Test
{
    static void Main()
    {
        XElement element = new XElement("tag",
                                        "Brackets & stuff <>");

        Console.WriteLine(element);
    }
}

DOM example:

using System;
using System.Xml;

class Test
{
    static void Main()
    {
        XmlDocument doc = new XmlDocument();
        XmlElement element = doc.CreateElement("tag");
        element.InnerText = "Brackets & stuff <>";
        Console.WriteLine(element.OuterXml);
    }
}

Output from both examples:

<tag>Brackets &amp; stuff &lt;&gt;</tag>

That's assuming you want XML escaping, of course. If you're not, please post more details.

Upvotes: 47

Related Questions