Reputation: 1577
Using Ruby's Nokogiri library, I want to parse an XML document as follows, extracting from it some elements (like "tsn" or "kingdom"):
<ns:searchByScientificNameResponse xmlns:ns="http://itis_service.itis.usgs.gov">
<ns:return xmlns:ax21="http://data.itis_service.itis.usgs.gov/xsd" xmlns:ax23="http://metadata.itis_service.itis.usgs.gov/xsd" xmlns:ax26="http://itis_service.itis.usgs.gov/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ax21:SvcScientificNameList">
<ax21:scientificNames xsi:type="ax21:SvcScientificName">
<ax21:tsn>26339</ax21:tsn>
<ax21:author>L.</ax21:author>
<ax21:combinedName>Vicia faba</ax21:combinedName>
<ax21:kingdom>Plantae</ax21:kingdom>
<ax21:unitInd1 xsi:nil="true" />
<ax21:unitInd2 xsi:nil="true" />
<ax21:unitInd3 xsi:nil="true" />
<ax21:unitInd4 xsi:nil="true" />
<ax21:unitName1>Vicia</ax21:unitName1>
<ax21:unitName2>faba</ax21:unitName2>
<ax21:unitName3 xsi:nil="true" />
<ax21:unitName4 xsi:nil="true" />
</ax21:scientificNames>
</ns:return>
</ns:searchByScientificNameResponse>
After opening the document with
doc = Nokogiri::XML(File.open("sample.xml"))
if I use
tsn = doc.at_xpath("//tsn")
puts tsn
I get a nil value, and if I use
tsn = doc.at_xpath("//:tsn")
I get an error: Nokogiri::XML::XPath::SyntaxError (ERROR: Invalid expression: //:tsn)
Could someone out there give me some help?
Upvotes: 0
Views: 746
Reputation: 29308
So the issue is that your XML contains namespaces.
There are 2 options:
doc.remove_namespaces!
doc.at_xpath("//tsn")
#=> #<Nokogiri::XML::Element:0x2add795ea3b8 name="tsn" children=[#<Nokogiri::XML::Text:0x2add795e5f70 "26339">]>
doc.at_xpath("//ax21:tsn", 'ax21' => "http://data.itis_service.itis.usgs.gov/xsd")
#=> #<Nokogiri::XML::Element:0x2add795ea3b8 name="tsn" children=[#<Nokogiri::XML::Text:0x2add795e5f70 "26339">]>
Based on the comments it seems you are really only interested in the text for that node. You can retrieve that in multiple ways:
doc.at_xpath("//tsn").text()
#=> "26339"
doc.at_xpath("//tsn/text()").to_s
#=> "26339"
# If you want tsn and kingdom at the same time
doc.xpath('//tsn/text() | //kingdom/text()').map(&:to_s)
#=> ["26339", "Plantae"]
Upvotes: 1
Reputation: 73
here's what I came up with
require 'nokogiri'
doc = Nokogiri::XML(File.open("sample.xml"))
node_names = []
doc.xpath('//*').each do |node|
node_names << node.name
end
print node_names
#=>["ns:searchByScientificNameResponse", "ns:return", "ax21:scientificNames", "ax21:tsn", "ax21:author", "ax21:combinedName", "ax21:kingdom", "ax21:unitInd1", "ax21:unitInd2", "ax21:unitInd3", "ax21:unitInd4", "ax21:unitName1", "ax21:unitName2", "ax21:unitName3", "ax21:unitName4"]
node_names.each do |elem|
if elem == "ax21:kingdom"
puts elem
elsif
elem == ("ax21:tsn")
puts elem
end
end
#=>ax21:tsn
#=>ax21:kingdom
Not sure if this is what you want, so I will include link to documentation that gave me this solution: https://gist.github.com/carolineartz/10276637
Upvotes: 0