Vicheanak
Vicheanak

Reputation: 6684

How to change user-agent and delay time in Scrapy?

I'm using Scrapy 0.16.4

I have used this code to change the download delay and user-agent:

DOWNLOAD_DELAY = 2
USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22 AlexaToolbar/alxg-3.1"

I'm not sure whether this is working, however, I still can't fully crawl all the pages from that site. It always gives me a random scraped items. Sometimes, I got 13, sometimes I got 30, and sometimes I got 52 scraped items.

What could be the issue?

Upvotes: 0

Views: 5309

Answers (2)

Liqun
Liqun

Reputation: 4171

There may be access limits per ip for some websites. There is a great possibility that they may not accumulate the access numbers for different user agents (like chrome, firefox, ie, or safari etc.), so you may try to use a dynamic user-agent pool to alleviate the heavy accesses.

Here is a link for how to "Using random user agent in Scrapy"

Upvotes: 4

krbnr
krbnr

Reputation: 160

Maybe the site blocks you with a captcha, you can print the response.url and see if you're getting a referer, try to set the DOWNLOAD_DELAY to 10, you can set it into the spider and printing the url, if takes 10 seconds to print it's working.

Upvotes: 0

Related Questions