Reputation: 319
I am writing a crawlspider using Scrapy and I use a downloader middleware to rotate user agents for each request. What I would like to know if there is a way to temporize this. In other words, I would like to know if it is possible to tell the spider to change User Agent every X seconds. I thought that, maybe, using the DOWNLOAD_DELAY setting to do this would do the trick.
Upvotes: 2
Views: 1640
Reputation: 473903
You might approach it a bit differently. Since you have control over the requests/sec crawling speed via CONCURRENT_REQUESTS
, DOWNLOAD_DELAY
and other relevant settings, you might just count how many requests in a row would go with the same User-Agent header.
Something along these lines (based on scrapy-fake-useragent
) (not tested):
from fake_useragent import UserAgent
class RotateUserAgentMiddleware(object):
def __init__(self, settings):
# let's make it configurable
self.rotate_user_agent_freq = settings.getint('ROTATE_USER_AGENT_FREQ')
self.ua = UserAgent()
self.request_count = 0
self.current_user_agent = self.ua.random
def process_request(self, request, spider):
if self.request_count >= self.rotate_user_agent_freq:
self.current_user_agent = self.ua.random
self.request_count = 0
else:
self.request_count += 1
request.headers.setdefault('User-Agent', self.current_user_agent)
This might be not particularly accurate since there also could be retries and other reasons that can theoretically screw up the count - test it please.
Upvotes: 3