amr
amr

Reputation: 816

Deserialize arbitrary XML

I'm working on a general system which requires the ability to deserialize arbitrary XML, meaning I don't know ahead of time what tags/attributes are going to be in the XML document. Ideally, XML like this:

<a att="1">
  <b />
</a>

Would become an object like this:

new Tag { name = "a", 
  attributes = new Dictionary<string,string> {{"att", "1"}},
  elements = new List<Tag> { new Tag { name = "b" } } }

It's perfectly fine if everything deserializes to strings.

Upvotes: 1

Views: 1174

Answers (2)

Dweeberly
Dweeberly

Reputation: 4777

Not a lot of info here. As an alternative external interface to the Linq for XML approach you can try a dynamic approach. Here is an example program:

class Program {
    static void Main(string[] args) {
        var xmlstr = 
@"<a att='1'>
    <b attb='a b c'>
      <c att2='text'>value</c>
    </b>
</a>";
        dynamic xml = new DynamicXml(xmlstr);
        Console.WriteLine(xml.a[0].att);
        Console.WriteLine(xml.a[0].b[0].attb);
        Console.WriteLine(xml.a[0].b[0].c[0].att2);

        }

    public class DynamicXml: DynamicObject {
        XElement _root;
        IEnumerable<XElement> _xele;

        public DynamicXml(string xml) {
            var xdoc = XDocument.Parse(xml);
            _root = xdoc.Root;
            }

        DynamicXml(XElement root) {
            _root = root;
            }

        DynamicXml(XElement root, IEnumerable<XElement> xele) {
            _root = root;
            _xele = xele;
            }

        public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result) {
            // you should check binder.CallInfo, but for the example I'm assuming [n] where n is int type indexing
            var idx = (int)indexes[0];
            result = new DynamicXml(_xele.ElementAt(idx));
            return true;
            }

        public override bool TryGetMember(GetMemberBinder binder, out object result) {
            var atr = _root.Attributes(binder.Name).FirstOrDefault();
            if (atr != null) {
                result = atr.Value;
                return true;
                }
            var ele = _root.DescendantsAndSelf(binder.Name);
            if (ele != null) {
                result = new DynamicXml(_root, ele);
                return true;
                }
            result = null;
            return false;
            }
        }
    }

Things to note with this is that XML elements can contain a value as well as attributes. You need a way to deal with a value. In the above code you could use a special name like "Value" but then you would not be able to handle and attribute of the same name.

Upvotes: 0

keenthinker
keenthinker

Reputation: 7830

As suggested, with LINQ to XML you can parse and extract data from an arbitrary XML file:

var xml = @"<?xml version=""1.0"" encoding=""utf-8""?><a id=""1"" name=""test"">an element<b>with sub-element</b></a>";
// load XML from string
var xmlDocument = XDocument.Parse(xml);
// OR load XML from file
//var xmlDocument = XDocument.Load(@"d:\temp\input.xml");
// find all elements of type b and in this case take the first one
var bNode = xmlDocument.Descendants("b").FirstOrDefault();
if (bNode != null)
{
    Console.WriteLine(bNode.Value);
}
// find the first element of type a and take the attribute name (TODO error handling)
Console.WriteLine(xmlDocument.Element("a").Attribute("name").Value);

Output is:

with sub-element
test

You can also very easily convert your object to an XML file:

// sample class
public class Entry
{
    public string Name { get; set; }
    public int Count { get; set; }
}

// create and fill the object
var entry = new Entry { Name = "test", Count = 10 };
// create xml container
var xmlToCreate = new XElement("entry", 
                    new XAttribute("count", entry.Count),
                    new XElement("name", entry.Name));
// and save it
xmlToCreate.Save(@"d:\temp\test.xml");

Newly created XML file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<entry count="10">
  <name>test</name>
</entry>

LINQ is very powerful and easy (and IMO intuitive) to use. This MSDN article gives a good insight on LINQ and his range of functions and abilities through good samples. LINQPad - minimalistic but very powerful IDE for .NET comes with very good build-in LINQ to XML tutorials and examples. Finally here is the list of all LINQ to XML extension methods at MSDN.

Another possibility is to use the XmlReader class to parse an arbitrary XML file. Here you are responsible to implement the parsing logic, so it might be cumbersome sometimes. Parsing the same input file using XmlReader looks like this:

public void parseUsingXmlReader(string xmlString)
{
    using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
    {
        XmlWriterSettings ws = new XmlWriterSettings();
        ws.Indent = true;
        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:
                    Console.WriteLine(string.Format("Element - {0}", reader.Name));
                    if (reader.HasAttributes)
                    {
                        for (var i = 0; i < reader.AttributeCount; i++)
                        {
                            Console.WriteLine(string.Format("Attribute - {0}", reader.GetAttribute(i)));
                        }
                        reader.MoveToElement();
                    }
                    break;
                case XmlNodeType.Text:
                    Console.WriteLine(string.Format("Element value - {0}", reader.Value));
                    break;
                //case XmlNodeType.XmlDeclaration:
                //case XmlNodeType.ProcessingInstruction:
                //  Console.WriteLine(reader.Name + " - " + reader.Value);
                //  break;
                case XmlNodeType.Comment:
                    Console.WriteLine(reader.Value);
                    break;
                case XmlNodeType.EndElement:
                    Console.WriteLine(reader.Value);
                    break;
            }
        }
    }
}
// use the new function with the input from the first example
parseUsingXmlReader(xml);

The output is:

Element - a
Attribute - 1
Attribute - test
Element value - an element
Element - b
Element value - with sub-element

As you can see, you need to take care of node types, current position, attributes and so on manually.

Upvotes: 3

Related Questions