Pravesh Jain
Pravesh Jain

Reputation: 4288

Scrapy : Pass HTTP proxy at runtime

I need to set HTTP Proxy for a particular spider in scrapy. Is there a way to pass a HTTP Proxy at runtime in Scrapy?

For example, I can pass user agent during runtime as follows:

scrapy crawl <spidername> -s USER_AGENT='<some user agent>'

Can I pass the HTTP proxy I want to use in a similar manner?

Upvotes: 0

Views: 351

Answers (2)

Bzisch
Bzisch

Reputation: 101

I'm not sure if you can pass a proxy at runtime but you could implement a class like this in middleware.py

class CustomProxyMiddleware(object):

    def process_request(self, request, spider):
        if spider.name == 'particular_spider':
            proxy = random.choice(LIST_OF_PROXIES)
            request.meta['proxy'] = proxy

You could do the same thing with user agent.

class CustomUserAgentMiddleware(object):

    def process_request(self, request, spider):
        if spider.name =='particular_spider':
            agent = random.choice(USER_AGENTS)
            request.headers['User-Agent'] = agent

Just make sure that you add those classes to DOWNLOADER_MIDDLEWARES in settings.py.

Upvotes: 1

paul trmbrth
paul trmbrth

Reputation: 20748

Scrapy understands http_proxy and https_proxy environment variables (see HttpProxyMiddleware documentation)

So you can do something like:

http_proxy="http://www.someproxy.com:3128" scrapy crawl <spidername>

Upvotes: 1

Related Questions