Reputation: 21

Issue parsing XML document in c#

I am trying to get the innertext from specific elements in an XML document, passed into via a string and I can't work out why it's not finding any nodes.

This code runs fine, but never enters either of the FOREACH loops as the ocNodesCompany and ocNodesOrgs both have xero elements. Why does the GetElementsByTagName not find the nodes?

BTW I've also tried:

XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//OpenCalaisSimple/CalaisSimpleOutputFormat/Company")

Code:

public static ArrayList getTwitterHandles(String ocXML)
{
    ArrayList thisList = new ArrayList();
    XmlDocument thisXmlDoc = new XmlDocument();
    thisXmlDoc.LoadXml(ocXML);

    //get Companies
    XmlNodeList ocNodesCompany = thisXmlDoc.GetElementsByTagName("Company");
    foreach (XmlElement element in ocNodesCompany)
    {
        thisList.Add(element.InnerText);
    }

    //Get Organisations
    XmlNodeList ocNodesOrgs = thisXmlDoc.GetElementsByTagName("Organization");
    foreach (XmlElement element in ocNodesOrgs)
    {
        thisList.Add(element.InnerText);
    }

    //Get Organisations

    return thisList;
}

My XML String is:

<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--><!-- Company: BBC,T-mobile,Vodafone,GE, IndustryTerm: open calais services, Organization: Federal Bureau of Investigation,Red Cross,Greenpeace,Royal Navy,-->

<OpenCalaisSimple>
    <Description>
        <calaisRequestID>38cb8898-48ba-85ff-12e9-f8d629568428</calaisRequestID>
        <id>http://id.opencalais.com/lt0Hf8XWIr2DNIJzNlaXlA</id>
        <about>http://d.opencalais.com/dochash-1/ff929eb2-de43-3ed1-8ee4-6109abf6bf77</about>
        <docTitle/>
        <docDate>2011-03-10 06:36:08.646</docDate>
        <externalMetadata/>
    </Description>
    <CalaisSimpleOutputFormat>
        <Company count="1" relevance="0.603" normalized="British Broadcasting Corporation">BBC</Company>
        <Company count="1" relevance="0.603" normalized="T-MOBILE NETHERLANDS HOLDING B.V.">T-mobile</Company>
        <Company count="1" relevance="0.603" normalized="Vodafone Group Plc">Vodafone</Company>
        <Company count="1" relevance="0.603" normalized="General Electric Company">GE</Company>
        <IndustryTerm count="1" relevance="0.603">open calais services</IndustryTerm>
        <Organization count="1" relevance="0.603">Red Cross</Organization>
        <Organization count="1" relevance="0.603">Greenpeace</Organization>
        <Organization count="1" relevance="0.603">Royal Navy</Organization>
        <Topics>
            <Topic Taxonomy="Calais" Score="0.899">Human Interest</Topic>
            <Topic Taxonomy="Calais" Score="0.694">Technology_Internet</Topic>
        </Topics>
    </CalaisSimpleOutputFormat>
</OpenCalaisSimple>

Upvotes: 2

Answers (3)

mgronber

Reputation: 3419

You could also use XDocument from System.Xml.Linq namespace. The following snippet is almost equivalent to your code. The return type is List<string> instead of ArrayList.

public static List<string> getTwitterHandles(String ocXml)
    var xml = XDocument.Parse(ocXml);
    var list = xml.Descendants("Company")
            .Concat(xml.Descendants("Organization"))
            .Select(element => element.Value)
            .ToList();
    return list;
}

Upvotes: 0

Matt Peddlesden

Reputation: 520

Note that Microsoft recommend you use XPath also, here is their help page for the GetElementsByTag method, and note the comment towards the middle recommending the use of SelectNodes instead (which is XPath).

http://msdn.microsoft.com/en-us/library/dc0c9ekk.aspx

A variation of your method, written with XPath, would be:

public static ArrayList getTwitterHandles(String ocXML)
{
    ArrayList thisList = new ArrayList();
    XmlDocument thisXmlDoc = new XmlDocument();
    thisXmlDoc.LoadXml(ocXML);

    //get Companies
    XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");
    foreach (XmlElement element in ocNodesCompany)
    {
        thisList.Add(element.InnerText);
    }

    //Get Organisations
    XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//Organization");
    foreach (XmlElement element in ocNodesOrgs)
    {
        thisList.Add(element.InnerText);
    }

    //Get Organisations

    return thisList;
}

Note that the above implements what I believe is the functionality you have in your example - which is not quite the same as the xpath you've tried. Essentially in XPath "//" means any parent nodes, so "//Company" will pick up ANY subnode of the root you pass in that has a name of Company.

If you only want specific Company nodes, then you can be more specific:

    XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");

becomes

    XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("/OpenCalaisSimple/CalaisSimpleOutputFormat/Company");

Note the key difference is that there is only ONE forward slash at the beginning.

I've just tested both variations and they work great.

If you're handling XML files then I would strongly recommend you read up on, and become a guru, of XPath, it's exceptionally handy for allowing you to rapidly write code to parse through XML files and pick out precisely what you need (though I should add it's not the only way to do it and it is certainly not appropriate for all circumstances of course :) )

Hope this helps.

Upvotes: 2

madcyree

Reputation: 1457

Seems like you should use XPath query to get elements you wanna recieve. You can read about it here

Upvotes: 0

Issue parsing XML document in c#

Answers (3)

Related Questions