Peter M
Peter M

Reputation: 7493

Deserialize badly designed XML

I'm in the process of refactoring some code that parses a pre-existing XML file (of which I did not create and cannot control the design). Currently I read in the XML into an XDocument and perform all sorts of linq queries to extract the data. What I want to do is to use XML deserialization to do all that work for me.

I want to do it this was as other sections of the code use XML deserialization (and I want to make the code consistent in operation) and also to better document the structure this XML file.

But buried 7 layers deep inside the XML is the following data:

<objects>
  <object name="Fred">
    <type>
      <BOOL/>
    </type>
  </object>
  <object name="Barney">
    <type>
      <WORD/>
    </type>
  </object>
  <object name="Wilma">
    <type>
      <derived name="Special1"/>
    </type>
  </object>
  <object name="Betty">
    <type>
      <array>
        <dimension upper="3" lower="0"/>
        <INT/>
      </array>
    </type>
  </object>
  <object name="Dino">
    <type>
      <array>
        <dimension upper="3" lower="0"/>
        <derived name="Special2"/>
      </array>
    </type>
  </object>
</objects>

Up until this point I had been able to get away with defining simple classes to model the XML.

But with the object data the value of the Type element is expressed as a sub-Element (and not an Attribute) for types that are well defined (EG BOOL, WORD). In the case of a user defined type, a different sub-element is used, with the ultimate type name being defined in the name attribute of that sub-element (EG Special1 or Special2). (Also note that I don't/can't have a complete list of the standard types.)

(NOTE that while this XML is badly designed, it is not malformed. )

Then things get a little more confusing when the object is an array and the type is wrapped in an array element.

Ultimately I'd want the type (encompassing both standard and user defined) and array dimensions as properties of the object class (and an indicator that a derived type was encountered).

I am not sure how to build a class that could be deserialized from this XML, however I suspect that I need to delve into some custom XML processing for just this class (or perhaps an XSLT transformation?).

Upvotes: 1

Views: 428

Answers (2)

Lauro
Lauro

Reputation: 911

As it was already said, I do not see any other alternative other than handling it manually, as you are already doing.

If neither you (probably nor the XML owner) do not know how the XML will be (schema-wise speaking), why do you think any technique could automatically understand?

So, I think you got it right in doing it manually.

Upvotes: 0

Sinatr
Sinatr

Reputation: 21989

Why to use XML deserialization to do all that work for me?

You have done it already manually, why do you want to do extra work? And extra work would be: a lot of classes, with a lot of substitutions (to have that <type> to example) via attributes.

Or, continue my example, if you wish here.


Here is deserialization, as you can see, it's totally the same.

Just for convenience, I'll post code here:

using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

public class Program
{
    [Serializable]
    [XmlRoot("Objects")]
    public class MyXml
    {
        [XmlElement("Object")]
        public MyObject[] MyObjects;
    }


    [Serializable]
    [XmlRoot("Object")]
    public class MyObject
    {
        [XmlAttribute("name")]
        public string MyName;
        [XmlElement("Type")]
        public object MyType;
    }

    public static void Main()
    {
        var data = new MyXml();
        data.MyObjects = new MyObject[] {new MyObject() { MyName = "Fred"}, new MyObject()};
        using (var stream = new MemoryStream())
        {
            var space = new XmlSerializerNamespaces();
            space.Add("", "");
            var serializer = new XmlSerializer(data.GetType());
            serializer.Serialize(stream, data, space);
            var text = Encoding.Default.GetString(stream.ToArray());
            foreach(var line in text.Split(System.Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
            Console.WriteLine(line);

            stream.Seek(0, SeekOrigin.Begin);
            var test = serializer.Deserialize(stream) as MyXml;
            Console.WriteLine("\nTest: " + test.MyObjects[0].MyName);

        }
    }
}

Upvotes: 1

Related Questions