Craig
Craig

Reputation: 18704

Parsing XML safely

I have been asked to write a method that takes the following structure of XML, and parse it into a List<> of objects:

<?xml version="1.0" encoding="UTF-8"?>
<comments>
    <comment>
        <date>11/MAR/2014 10:34am</date>
        <userid>1</userid>
        <text>This is a comment. Please remember to try with some formatting. I'm assuming that some charactors need to be prefixed with a backslash.</text>
    </comment>
    <comment>
        <date>11/MAR/2014 10:37am</date>
        <userid>1</userid>
        <text>This is another comment./r/nIt\'s showing how more than one comment would be stored.\r\n\r\nAlso, this one has some really hardcore escape charactors!</text>
    </comment>
</comments>

There can by many 'comments'.

So with my basic knowledge of XML, I tried this:

 private List<Comment> XMLToList(string xml)
        {

            var result = new List<Comment>();

            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.LoadXml(xml);

            XmlNodeList xnList = xmlDoc.SelectNodes("/comments/comment");

            foreach (XmlNode node in xnList)
            {
                var id = node["userid"].InnerText;
                var date = node["date"].InnerText;
                var text = node["texts"].InnerText;

                result.Add(new Comment
                {
                    Date = DateTime.Parse(date),
                    Text = text,
                    UserID = int.Parse(id)

                });


            }

Which works, but is very unsafe. What if nodes are missing? Missplet? Etc.

Is there a way to ensure the XML doc is correctly formatted first? And then, is there a safer way to get the data?

(I can fix the parsing of dates and int values... this is just a test. It's more the accessing of the nodes that I am trying to resolve).

Can it be done with Linq maybe?

Upvotes: 1

Views: 864

Answers (1)

Selman Gen&#231;
Selman Gen&#231;

Reputation: 101701

You can do it with LINQ to XML easily:

var comments = XDocument.Parse(xml).Descendants("comment")
            .Select(x => new  Comment
            {
               Date = x.Element("date") != null ? (DateTime)x.Element("date") : default(DateTime),
               Text = (string)x.Element("text"),
               UserID =x.Element("userid") != null ? (int)x.Element("userid") : default(int)
            }).ToList();

The key point is using an explicit cast while receiving the values.I have added some null-checks to avoid ArgumentNullException.It is throwing when you trying to cast null to ValueType.String is not a problem, it just returns as null.

If you want to ensure that the xml format is valid you can use try/catch like this:

try
{
      // parse the xml file
}
catch (XmlException ex)
{
     // this exception is thrown by Parse method 
     // when the xml file format is invalid
 }

Upvotes: 3

Related Questions