Reputation: 2053
I'm trying to learn scripting with Ruby, and this is my first problem.
I have an HTML file which contains states and their cities. I need to be able to access the cities and know which state they belong to in my Ruby code, so I plan on parsing the HTML and creating a hash for each city, like this: {New York => New York City}.
I'm attempting to use Nokogiri, which I'm just learning now.
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
<h4>State</h4>
<ul>
<li>city</li>
<li>city</li>
<li>city</li>
</ul>
I'm using this to get the states into an array:
require 'rubygems'
require 'nokogiri'
page = Nokogiri::HTML(open("to_parse.html"))
states = Array.new(100), index = 0
page.css('h4').each do |s|
states[index] = s.text
puts states[index]
index += 1
end
This actually doesn't really help; I need to figure out how I can get Nokogiri
to parse the elements of each list into hashes
containing the city and its state. I'm not sure how to have a loop break when it finishes the city list of one state, and create a new set of hashes
for the city list of the next state.
I'm thinking I'll have to create a hash
for each list element and store the text of the h4
tag for that list inside each hash
, so I know which state the city belongs to. Which is what I'm not sure how to do.
Feel free to offer some advice on refactoring what I've got, as I know it could be done better.
Upvotes: 0
Views: 628
Reputation: 37527
XPath selectors can help you out here.
states = doc.css('li').map do |city|
state = city.xpath('../preceding-sibling::h4[1]')
[city.text, state.text]
end.to_h
#=> {'city' => 'State', ...}
This grabs all the li
city elements, then traces back to their state. (the XPath reads like so: ..
= up one level, preceding-sibling::h4
= the preceding h4
elements, [1]
= the first such element)
Some comments on your code: In Ruby, you don't need to initialize arrays, and with the Enumerable methods like map
you never need to track index variables in loops.
Note that the final to_h
only works in Ruby 2.1 or greater.
Upvotes: 1