Reputation: 516
I'm trying to parse an HTML file with the following format at the required section:
<div style="something">
<div class="link">
<a href="http://..." class="headline">Headline</a>
</div>
<div class="text">
Text summary is here
</div>
repeating...
</div>
I want to output the headline followed by the text.
HEADLINE
Text goes here.
HEADLINE
Text goes here.
Currently I can search for the < a> tag with class="headline" and get a list and do the same with the text div. Then iterate through each to output the headline and text sequentially.
Can I get Hpricot/Nokogiri to save it in that order while it is parsing the file?
Upvotes: 0
Views: 744
Reputation: 37507
Sure.
doc = Nokogiri::HTML(html)
doc.xpath('//a[@class="headline"]').each do |headline|
puts headline.text
puts headline.xpath('../following-sibling::div[1]').text
end
Upvotes: 3