user4148144
user4148144

Reputation:

Custom XML-like Syntax Parsing

I'm attempting to replicate a dialogue system from a game that has control codes, which are HTML/XML-like tags that dictate behavior of a text bubble. For example, changing the color of a piece of text would be like <co FF0000FF>Hello World!</co>. These control codes are not required in the text, so Hello <co FF0000FF>World!</co> or simply Hello World should parse as well.

I've attempted to make it similar to XML to ease parsing, but XML requires a root-level tag to parse successfully, and the text may or may not have any control codes. For example, I'm able to parse the following fine with XElement.

string Text = "<co value=\"FF0000FF\">Hello World!</co>"
XElement.Parse(Text);

However, the following fails with an XMLException ("Data at the root level is invalid. Line 1, position 1."):

string Text = "Hello <co value=\"FF0000FF\">World!</co>"
XElement.Parse(Text);

What would be a good approach to handling this? Is there a way to handle parsing XML elements in a string without requiring a strict XML syntax, or is there another type of parser I can use to achieve what I want?

Upvotes: 1

Views: 245

Answers (3)

Charles Mager
Charles Mager

Reputation: 26213

If the XML elements within your text will always be well-formed, then you can use the XML libraries to do this.

You can either wrap your text inside a root element and use XElement.Parse and read the child nodes, or you can use some lower level bits to allow you to parse the nodes in an XML fragment:

public static IEnumerable<XNode> Parse(string text)
{
    var settings = new XmlReaderSettings
    {
        ConformanceLevel = ConformanceLevel.Fragment
    };

    using (var sr = new StringReader(text))
    using (var xr = XmlReader.Create(sr, settings))
    {
        xr.MoveToContent();

        while (xr.EOF == false)
        {
            yield return XNode.ReadFrom(xr);
        }
    }
}

Using it like this:

foreach (var node in Parse("Hello <co value=\"FF0000FF\">World!</co>"))
{
    Console.WriteLine($"{node.GetType().Name}: {node}");
}

Would output this:

XText: Hello
XElement: <co value="FF0000FF">World!</co>

See this fiddle for a working demo.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163458

If the only difference between your XML-like fragments and real XML is the absence of a root element, then simply wrap the fragment in a dummy root element before parsing:

parse("<dummy>" + fragment + "</dummy>")

If there are other differences, for example attributes not being in quotes, or attribute names starting with a digit, then an XML parser isn't going to be much use to you, you will need to write your own. Or an HTML parser such as validator.nu might handle it, if you're lucky.

Upvotes: 1

Gaurang Dave
Gaurang Dave

Reputation: 4046

You can try with HtmlAgilityPack

Install Nuget packge by firing this command Install-Package HtmlAgilityPack

The following sample will return all the child nodes. I did not pass any level to Descendants but you can further put more code as per need.

It will parse your custom format.

string Text = "Hello <co value=\"FF0000FF\">World!</co>";

Text = System.Net.WebUtility.HtmlDecode(Text);
HtmlDocument result = new HtmlDocument();
result.LoadHtml(Text);

List<HtmlNode> nodes = result.DocumentNode.Descendants().ToList();

Upvotes: 0

Related Questions