Reputation: 4288
I need to set HTTP Proxy for a particular spider in scrapy. Is there a way to pass a HTTP Proxy at runtime in Scrapy?
For example, I can pass user agent during runtime as follows:
scrapy crawl <spidername> -s USER_AGENT='<some user agent>'
Can I pass the HTTP proxy I want to use in a similar manner?
Upvotes: 0
Views: 351
Reputation: 101
I'm not sure if you can pass a proxy at runtime but you could implement a class like this in middleware.py
class CustomProxyMiddleware(object):
def process_request(self, request, spider):
if spider.name == 'particular_spider':
proxy = random.choice(LIST_OF_PROXIES)
request.meta['proxy'] = proxy
You could do the same thing with user agent.
class CustomUserAgentMiddleware(object):
def process_request(self, request, spider):
if spider.name =='particular_spider':
agent = random.choice(USER_AGENTS)
request.headers['User-Agent'] = agent
Just make sure that you add those classes to DOWNLOADER_MIDDLEWARES in settings.py.
Upvotes: 1
Reputation: 20748
Scrapy understands http_proxy
and https_proxy
environment variables (see HttpProxyMiddleware
documentation)
So you can do something like:
http_proxy="http://www.someproxy.com:3128" scrapy crawl <spidername>
Upvotes: 1