ShaunK
ShaunK

Reputation: 1221

Grabbing the text from HTML source code of URL using Ruby

I've read a couple of articles and posts on stackoverflow surrounding this topic. I apologize if I am repeating someone else's post on stack. Is there a way to iterate through the HTML source code of a given URL and return the text of a header tag?

Example:

<h2 class='title'>
<a href="/blog/step-by-step-guide-to-building-your-first-ruby-gem">Step-by-Step Guide to Building Your First Ruby Gem</a>
</h2>

The code looks for the

tag and returns Step-by-Step Guide to Building Your First Ruby Gem. I know there's the Nokogiri gem that searches for nodes in a xpath:

doc.xpath('//h3/a').each do |link|
puts link.content
end

Is there one where I could potentially do

doc.html('h1').each do |tag| puts link.content end

I hope it makes sense...any insight of direction to a resource will be much appreciated.

Upvotes: 1

Views: 76

Answers (1)

Amadan
Amadan

Reputation: 198324

Nokogiri has both XPath and CSS accessors, so you can do

doc.css('h1 > a').each do |tag| puts link.content end

if you don't like XPath. (Or just 'h1' - I am not 100% sure if you want the text of links in headers, or headers themselves).

Upvotes: 1

Related Questions