Reputation: 298
How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem:
Anemone.crawl("https://www.google.com.vn/") do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
I'm newbie in web crawler. Thanks in advance!
Upvotes: 6
Views: 3933
Reputation: 45
It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook.
To get all links from a page with Mechanize try this:
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://example.com")
page.links.each {|link| puts "#{link.text} => #{link.href}"}
The code is clear I think. page
is a Mechanize::Page object that stores the whole content of the retrieved page. Mechanize::Page has the links
method.
Mechanize is very powerful, but remember that if you want to do scraping without any interaction with the website use Nokogiri. Mechanize uses Nokogiri to scrap the web, so for scraping only use Nokogiri.
Upvotes: 2