Aydar Omurbekov
Aydar Omurbekov

Reputation: 2117

web crawler in rails,how to crawl all pages of the site

I need to get all urls from all pages of the given domain,
I think it make sense to use background jobs, placing them on multiple queues
trying to use cobweb but it seems very confusing gem,
and anomone, anemone is working for a long time if there are a lot of pages

require 'anemone'

Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_every_page do |page|
      puts page.links
  end
end

What do u think would fit me best?

Upvotes: 0

Views: 1490

Answers (1)

ajknzhol
ajknzhol

Reputation: 6450

You can use Nutch Crawler, Apache Nutch is a highly extensible and scalable open source web crawler software project.

Upvotes: 2

Related Questions