Reputation: 2117
I need to get all urls from all pages of the given domain,
I think it make sense to use background jobs, placing them on multiple queues
trying to use cobweb but it seems very confusing gem,
and anomone, anemone is working for a long time if there are a lot of pages
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_every_page do |page|
puts page.links
end
end
What do u think would fit me best?
Upvotes: 0
Views: 1490
Reputation: 6450
You can use Nutch
Crawler, Apache Nutch is a highly extensible and scalable open source web crawler software project.
Upvotes: 2