Reputation: 6602
How do you traverse up to a certain found element and then continue to the next found item? In my example I am trying to search for the first element, grab the text, and then continue until I find the next tag or until I hit a specific tag. The reason I need to also take into account the tag is because I want to do something there.
Html
<table border=0>
<tr>
<td width=180>
<font size=+1><b>apple</b></font>
</td>
<td>Description of an apple</td>
</tr>
<tr>
<td width=180>
<font size=+1><b>banana</b></font>
</td>
<td>Description of a banana</td>
</tr>
<tr>
<td><img vspace=4 hspace=0 src="common/dot_clear.gif"></td>
</tr>
...Then this repeats itself in a similar format
Current scrape.rb
#...
document.at_css("body").traverse do |node|
#if <font> is found
#puts text in font
#else if <img> is found then
#puts img src and continue loop until end of document
end
Thank you!
Upvotes: 2
Views: 510
Reputation: 13014
Interesting. You basically want to traverse through all the children in your tree and perform some operations on basis of the nodes obtained.
So here is how we can do that:
#Acquiring dummy page
page = Nokogiri::HTML(open('http://en.wikipedia.org/wiki/Ruby_%28programming_language%29'))
Now, if you want to start traversing all body
elements, we can employ XPath
for our rescue. XPath expression: //body//*
will give back all the children and grand-children in body
.
This would return the array of elements with class Nokogiri::XML::Element
page.xpath('//body//*')
page.xpath('//body//*').first.node_name
#=> "div"
So, you can now traverse on that array and perform your operations:
page.xpath('//body//*').each do |node|
case node.name
when 'div' then #do this
when 'font' then #do that
end
end
Upvotes: 1
Reputation: 30300
Something like this perhaps:
document.at_css("body").traverse do |node|
if node.name == 'font'
puts node.content
elsif node.name == 'img'
puts node.attribute("src")
end
Upvotes: 0