Reputation: 21
I am trying to get the innertext from specific elements in an XML document, passed into via a string and I can't work out why it's not finding any nodes.
This code runs fine, but never enters either of the FOREACH loops as the ocNodesCompany and ocNodesOrgs both have xero elements. Why does the GetElementsByTagName
not find the nodes?
BTW I've also tried:
XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//OpenCalaisSimple/CalaisSimpleOutputFormat/Company")
Code:
public static ArrayList getTwitterHandles(String ocXML)
{
ArrayList thisList = new ArrayList();
XmlDocument thisXmlDoc = new XmlDocument();
thisXmlDoc.LoadXml(ocXML);
//get Companies
XmlNodeList ocNodesCompany = thisXmlDoc.GetElementsByTagName("Company");
foreach (XmlElement element in ocNodesCompany)
{
thisList.Add(element.InnerText);
}
//Get Organisations
XmlNodeList ocNodesOrgs = thisXmlDoc.GetElementsByTagName("Organization");
foreach (XmlElement element in ocNodesOrgs)
{
thisList.Add(element.InnerText);
}
//Get Organisations
return thisList;
}
My XML String is:
<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--><!-- Company: BBC,T-mobile,Vodafone,GE, IndustryTerm: open calais services, Organization: Federal Bureau of Investigation,Red Cross,Greenpeace,Royal Navy,-->
<OpenCalaisSimple>
<Description>
<calaisRequestID>38cb8898-48ba-85ff-12e9-f8d629568428</calaisRequestID>
<id>http://id.opencalais.com/lt0Hf8XWIr2DNIJzNlaXlA</id>
<about>http://d.opencalais.com/dochash-1/ff929eb2-de43-3ed1-8ee4-6109abf6bf77</about>
<docTitle/>
<docDate>2011-03-10 06:36:08.646</docDate>
<externalMetadata/>
</Description>
<CalaisSimpleOutputFormat>
<Company count="1" relevance="0.603" normalized="British Broadcasting Corporation">BBC</Company>
<Company count="1" relevance="0.603" normalized="T-MOBILE NETHERLANDS HOLDING B.V.">T-mobile</Company>
<Company count="1" relevance="0.603" normalized="Vodafone Group Plc">Vodafone</Company>
<Company count="1" relevance="0.603" normalized="General Electric Company">GE</Company>
<IndustryTerm count="1" relevance="0.603">open calais services</IndustryTerm>
<Organization count="1" relevance="0.603">Red Cross</Organization>
<Organization count="1" relevance="0.603">Greenpeace</Organization>
<Organization count="1" relevance="0.603">Royal Navy</Organization>
<Topics>
<Topic Taxonomy="Calais" Score="0.899">Human Interest</Topic>
<Topic Taxonomy="Calais" Score="0.694">Technology_Internet</Topic>
</Topics>
</CalaisSimpleOutputFormat>
</OpenCalaisSimple>
Upvotes: 2
Views: 421
Reputation: 3419
You could also use XDocument
from System.Xml.Linq
namespace. The following snippet is almost equivalent to your code. The return type is List<string>
instead of ArrayList
.
public static List<string> getTwitterHandles(String ocXml)
var xml = XDocument.Parse(ocXml);
var list = xml.Descendants("Company")
.Concat(xml.Descendants("Organization"))
.Select(element => element.Value)
.ToList();
return list;
}
Upvotes: 0
Reputation: 520
Note that Microsoft recommend you use XPath also, here is their help page for the GetElementsByTag method, and note the comment towards the middle recommending the use of SelectNodes instead (which is XPath).
http://msdn.microsoft.com/en-us/library/dc0c9ekk.aspx
A variation of your method, written with XPath, would be:
public static ArrayList getTwitterHandles(String ocXML)
{
ArrayList thisList = new ArrayList();
XmlDocument thisXmlDoc = new XmlDocument();
thisXmlDoc.LoadXml(ocXML);
//get Companies
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");
foreach (XmlElement element in ocNodesCompany)
{
thisList.Add(element.InnerText);
}
//Get Organisations
XmlNodeList ocNodesOrgs = thisXmlDoc.SelectNodes("//Organization");
foreach (XmlElement element in ocNodesOrgs)
{
thisList.Add(element.InnerText);
}
//Get Organisations
return thisList;
}
Note that the above implements what I believe is the functionality you have in your example - which is not quite the same as the xpath you've tried. Essentially in XPath "//" means any parent nodes, so "//Company" will pick up ANY subnode of the root you pass in that has a name of Company.
If you only want specific Company nodes, then you can be more specific:
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("//Company");
becomes
XmlNodeList ocNodesCompany = thisXmlDoc.SelectNodes("/OpenCalaisSimple/CalaisSimpleOutputFormat/Company");
Note the key difference is that there is only ONE forward slash at the beginning.
I've just tested both variations and they work great.
If you're handling XML files then I would strongly recommend you read up on, and become a guru, of XPath, it's exceptionally handy for allowing you to rapidly write code to parse through XML files and pick out precisely what you need (though I should add it's not the only way to do it and it is certainly not appropriate for all circumstances of course :) )
Hope this helps.
Upvotes: 2