Reputation: 109
I have a partial HTML document:
<h2>Destinations</h2>
<div>It is nice <b>anywhere</b> but here.
<ul>
<li>Florida</li>
<li>New York</li>
</ul>
<h2>Shopping List</h2>
<ul>
<li>Booze</li>
<li>Bacon</li>
</ul>
On every <li>
item, I want to know the category the item is in, e.g., the text in the <h2>
tags.
This code does not work, but this is what I'm trying to do:
@page.search('li').each do |li|
li.previous('h2').text
end
Upvotes: 3
Views: 2272
Reputation: 37517
You are close.
@page.search('li').each do |li|
category = li.xpath('../preceding-sibling::h2').text
puts "#{li.text}: category #{category}"
end
Upvotes: 1
Reputation: 7714
Nokogiri allows you to use xpath expressions to locate an element:
categories = []
doc.xpath("//li").each do |elem|
categories << elem.parent.xpath("preceding-sibling::h2").last.text
end
categories.uniq!
p categories
The first part looks for all "li" elements, then inside, we look for the parent (ul, ol), the for an element before (preceding-sibling) which is an h2. There can be more than one, so we take the last (ie, the one closest to the current position).
We need to call "uniq!" as we get the h2 for each 'li' (as the 'li' is the starting point).
Using your own HTML example, this code output:
["Destinations", "Shopping List"]
Upvotes: 4
Reputation: 1
The code:
categories = []
Nokogiri::HTML("yours HTML here").css("h2").each do |category|
categories << category.text
end
The result:
categories = ["Destinations", "Shopping List"]
Upvotes: -2