1Rhino
1Rhino

Reputation: 298

How can I get all links of a website using the Mechanize gem?

How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem:

Anemone.crawl("https://www.google.com.vn/") do |anemone|
  anemone.on_every_page do |page|
    puts page.url
  end
end

I'm newbie in web crawler. Thanks in advance!

Upvotes: 6

Views: 3933

Answers (1)

MonkTools
MonkTools

Reputation: 45

It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook.

To get all links from a page with Mechanize try this:

require 'mechanize'

agent = Mechanize.new
page = agent.get("http://example.com")
page.links.each {|link| puts "#{link.text} => #{link.href}"}

The code is clear I think. page is a Mechanize::Page object that stores the whole content of the retrieved page. Mechanize::Page has the links method.

Mechanize is very powerful, but remember that if you want to do scraping without any interaction with the website use Nokogiri. Mechanize uses Nokogiri to scrap the web, so for scraping only use Nokogiri.

Upvotes: 2

Related Questions