Nilesh Guria
Nilesh Guria

Reputation: 139

What is a good crawling speed rate?

I'm crawling web pages to create a search engine and have been able to crawl close to 9300 pages in 1 hour using Scrapy. I'd like to know how much more can I improve and what value is considered as a 'good' crawling speed.

Upvotes: 5

Views: 865

Answers (3)

Defalt Zer0
Defalt Zer0

Reputation: 1

It really depends but you can always check your crawling benchmarks for your hardware by typing scrapy bench on your command line

Upvotes: 0

eLRuLL
eLRuLL

Reputation: 18799

Short answer: There is no real recommended speed for creating a search engine.

Long answer:

Crawling speed, in general, doesn't really determine if your crawler is good or bad, or even if it will work as the program that feeds your search engine.

You also cannot talk about crawling speed when talking to crawl a lot of pages, on multiple sites. Crawling speed should be determined per site only, meaning that the crawler should be configurable in a way that it can be changed how often it hits a site at any specific time, you can see that Google also offers this.

If we are talking about the current rate you mentioned (9300/hour), it means you are collecting ~2.5 pages per second, which I would say it is not bad, but as explained before, it doesn't help determine your end goal (create a search engine).

Also, if you really decide to implement a broad crawler for creating a search engine with Scrapy, you'll never only send 1 process with Scrapy. You'll need to setup thousands (even more) of spiders running to check to get the more information needed. Also you'll have to setup different services to help you maintain those spiders and how they behave between processes. For starters I would recommend checking Frontera and Scrapyd.

Upvotes: 7

Matt Turner
Matt Turner

Reputation: 13

I'm no expert but I would say that your speed is pretty slow. I just went to google, typed in the word "hats", pressed enter and: about 650,000,000 results (0.63 seconds). That's gonna be tough to compete with. I'd say that there's plenty of room to improve.

Upvotes: -1

Related Questions