Reputation:
I'm trying to extract some information from XML from Weather Underground.
I can open the resource and pull out the desired elements, but I really want to return the element text
as a variable, without the containing XML element tags, so I can manipulate it and display it on a web page.
Perhaps there is a way to do this using regexp to strip off the tags, but I suspect/hope I can do this in a more elegant fashion directly in Nokogiri.
Currently I am using irb to work out the syntax:
irb>require 'rubygems'
irb>require 'nokogiri'
irb>require 'open-uri'
irb>doc = Nokogiri::XML(open('http://api.wunderground.com/auto/wui/geo/WXCurrentObXML/index.xml?query=KBHB'))
=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
=> <?xml version="1.0"?>
# [...]
<!-- 0.036:0 -->
irb>doc.xpath('/current_observation/weather')
=> <weather>Clear</weather>irb(main):019:0>
irb>doc.xpath('/current_observation/wind_dir')
=> <wind_dir>North</wind_dir>
irb>doc.xpath('/current_observation/wind_mph')
=> <wind_mph>10</wind_mph>
irb>doc.xpath('/current_observation/pressure_string')
=> <pressure_string>31.10 in (1053 mb)</pressure_string>
I need help with the specific syntax while using constructs such as:
doc.xpath.element('/current_observation/weather')
doc.xpath.text('/current_observation/weather')
doc.xpath.node('/current_observation/weather')
doc.xpath.element.text('/current_observation/weather')
All return errors.
Upvotes: 0
Views: 1971
Reputation: 160551
One of the nice things about Nokogiri is its flexibility when writing accessors. You're not limited to XPath only, instead you can use CSS accessors:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://api.wunderground.com/auto/wui/geo/WXCurrentObXML/index.xml?query=KBHB'))
weather_report = %w[weather wind_dir wind_mph pressure_string].inject({}) { |h, n|
h[n.to_sym] = doc.at('current_observation ' << n).text
h
}
weather_report # => {:weather=>"Overcast", :wind_dir=>"South", :wind_mph=>"6", :pressure_string=>"29.67 in (1005 mb)"}
Upvotes: 0
Reputation: 33640
As per XPath, you can return the text node of an element with text()
.
In your example it should be doc.xpath('/current_observation/weather/text()')
to get the content of weather's
text node.
Upvotes: 1
Reputation: 70142
Something like this works for me:
irb(main):019:0> doc.xpath('//current_observation/weather').first.content
=> "Clear"
Upvotes: 0