Reputation:
I'm attempting to replicate a dialogue system from a game that has control codes, which are HTML/XML-like tags that dictate behavior of a text bubble. For example, changing the color of a piece of text would be like <co FF0000FF>Hello World!</co>
. These control codes are not required in the text, so Hello <co FF0000FF>World!</co>
or simply Hello World
should parse as well.
I've attempted to make it similar to XML to ease parsing, but XML requires a root-level tag to parse successfully, and the text may or may not have any control codes. For example, I'm able to parse the following fine with XElement.
string Text = "<co value=\"FF0000FF\">Hello World!</co>"
XElement.Parse(Text);
However, the following fails with an XMLException ("Data at the root level is invalid. Line 1, position 1."):
string Text = "Hello <co value=\"FF0000FF\">World!</co>"
XElement.Parse(Text);
What would be a good approach to handling this? Is there a way to handle parsing XML elements in a string without requiring a strict XML syntax, or is there another type of parser I can use to achieve what I want?
Upvotes: 1
Views: 245
Reputation: 26213
If the XML elements within your text will always be well-formed, then you can use the XML libraries to do this.
You can either wrap your text inside a root element and use XElement.Parse
and read the child nodes, or you can use some lower level bits to allow you to parse the nodes in an XML fragment:
public static IEnumerable<XNode> Parse(string text)
{
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (var sr = new StringReader(text))
using (var xr = XmlReader.Create(sr, settings))
{
xr.MoveToContent();
while (xr.EOF == false)
{
yield return XNode.ReadFrom(xr);
}
}
}
Using it like this:
foreach (var node in Parse("Hello <co value=\"FF0000FF\">World!</co>"))
{
Console.WriteLine($"{node.GetType().Name}: {node}");
}
Would output this:
XText: Hello
XElement: <co value="FF0000FF">World!</co>
See this fiddle for a working demo.
Upvotes: 0
Reputation: 163458
If the only difference between your XML-like fragments and real XML is the absence of a root element, then simply wrap the fragment in a dummy root element before parsing:
parse("<dummy>" + fragment + "</dummy>")
If there are other differences, for example attributes not being in quotes, or attribute names starting with a digit, then an XML parser isn't going to be much use to you, you will need to write your own. Or an HTML parser such as validator.nu might handle it, if you're lucky.
Upvotes: 1
Reputation: 4046
You can try with HtmlAgilityPack
Install Nuget packge by firing this command Install-Package HtmlAgilityPack
The following sample will return all the child nodes. I did not pass any level to Descendants
but you can further put more code as per need.
It will parse your custom format.
string Text = "Hello <co value=\"FF0000FF\">World!</co>";
Text = System.Net.WebUtility.HtmlDecode(Text);
HtmlDocument result = new HtmlDocument();
result.LoadHtml(Text);
List<HtmlNode> nodes = result.DocumentNode.Descendants().ToList();
Upvotes: 0