Alex
Alex

Reputation: 398

Parsing an UN XML file in C#

I'm trying to parse an XML file from UN website (http://www.un.org/sc/committees/1267/AQList.xml) using c#.

There is one problem I'm constantly having with this file, and that's the number of child tags varies from one <.INDIVIDUAL.> tag to another. One example is <.FORTH_NAME.> child tag.

I've tried a number of different approaches, but somehow I always seem to be stuck with the same problem, and that's different number of child tags inside <.INDIVIDUAL.> tag.

What I'm trying to achieve is to collect all the tags and their values under one <.INDIVIDUAL.> tag, and then insert only those I want into my database. If a tag is missing, for example <.FOURTH_NAME.>, than I need to insert only first three names into the database, and skip the fourth.

I've tried using Linq to XML, and here are some examples:

           XDocument xdoc = XDocument.Load(path);

            var tags = (from t in xdoc.Descendants("INDIVIDUALS")
                        from a in t.Elements("INDIVIDUAL")

                        select new
                        {
                            Tag = a.Name,
                            val = a.Value
                        });

            foreach (var obj in tags)
            {
                Console.WriteLine(obj.Tag + " - " + obj.val + "\t");

//insert SQL goes here
            }

or:

but this one only collects non empty FOURTH_NAME tags...

            var q = (from c in xdoc.Descendants("INDIVIDUAL")
                     from _1 in c.Elements("FIRST_NAME")
                     from _2 in c.Elements("SECOND_NAME")
                     from _3 in c.Elements("THIRD_NAME")
                     from _4 in c.Elements("FOURTH_NAME")

                     where _1 != null && _2 != null && _3 != null && _4 != null

                     select new
                     {
                         _1 = c.Element("FIRST_NAME").Value,
                         _2 = c.Element("SECOND_NAME").Value,
                         _3 = c.Element("THIRD_NAME").Value,
                         _4 = c.Element("FOURTH_NAME").Value
                     });

            foreach (var obj in q)
            {
                Console.WriteLine("Person: " + obj._1 + " - " + obj._2 + " - " + obj._3 + " - " + obj._4);
//insert SQL goes here
            }

Any ideas??

Upvotes: 0

Views: 136

Answers (2)

Jim Wooley
Jim Wooley

Reputation: 10418

Instead of calling Value on the element, consider using a string cast. LINQ to XML safely returns null if the element doesn't exist. Try the following:

var  data = XElement.Load(@"http://www.un.org/sc/committees/1267/AQList.xml");
var individuals = data.Descendants("INDIVIDUAL")
    .Select(i => new {
        First = (string)i.Element("FIRST_NAME"),
        Middle = (string)i.Element("SECOND_NAME"),
        Last = (string)i.Element("THIRD_NAME")
    });

If you want to be more flexible and get all of the name fields, you can do something like the following. (I'll leave the process of grouping individuals as an additional homework assignment ;-)

data.Descendants("INDIVIDUAL").Elements()
   .Where (i =>i.Name.LocalName.EndsWith("_NAME" ))
   .Select(i => new { FieldName= i.Name.LocalName, Value=i.Value});

Upvotes: 1

aybe
aybe

Reputation: 16682

Why don't you use XmlSerializer and LINQ instead ?

As explained in this answer, generate your classes by pasting in a new CS file :

menu EDIT > Paste Special > Paste XML As Classes.

Then grab your data as easily as follows :

var serializer = new XmlSerializer(typeof (CONSOLIDATED_LIST));
using (FileStream fileStream = File.OpenRead(@"..\..\aqlist.xml"))
{
    var list = serializer.Deserialize(fileStream) as CONSOLIDATED_LIST;
    if (list != null)
    {
        var enumerable = list.INDIVIDUALS.Select(s => new
        {
            FirstName = s.FIRST_NAME,
            SecondName = s.SECOND_NAME,
            ThirdName = s.THIRD_NAME,
            FourthName = s.FOURTH_NAME
        });
    }
}

enter image description here

You can then specify any predicate that better suits your needs.

Going this path will be a huge time-saver and less error-prone, no need to use strings to access fields, strong typing etc ...

Upvotes: 1

Related Questions