Reputation: 1221
I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?
Example:
<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>
The code looks for the
tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:doc.xpath('//h3/a').each do |link|
puts link.content
end
Is there one where I could potentially do
doc.html('h1').each do |tag| puts link.content end
I hope it makes sense...any insight of direction to a resource will be much appreciated.
Upvotes: 1
Views: 76
Reputation: 198324
Nokogiri has both XPath and CSS accessors, so you can do
doc.css('h1 > a').each do |tag| puts link.content end
if you don't like XPath. (Or just 'h1'
- I am not 100% sure if you want the text of links in headers, or headers themselves).
Upvotes: 1