Reputation: 2611
I am using scrapy to scrape some web pages. I wrote my customised ProxyMiddleware class in which I implemented my requirement in process_request(self,request,spider) method. Here is my code(copied):
class ProxyMiddleware(scrapy.downloadermiddlewares.httpproxy):
def __init__(self, proxy_ip=''):
self.proxy_ip = proxy_ip
def process_request(self,request,spider):
ip = random.choice(self.proxy_list)
if ip:
request.meta['proxy'] = ip
return request
proxy_list = [list of proxies]
Now, I didn't understand how scrapy will consider my implementation instead of default class. After some searching and brainstorming, what I understood is, I need to make changes in settings.py
DOWNLOADER_MIDDLEWARES = {
'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
'IPProxy.IPProxy.spiders.RandomProxy': 600
}
For better understanding of my project structure to readers, I added second element in the list with some random value. My project structure is:
My question is,
Upvotes: 5
Views: 8336
Reputation: 473933
If you want to disable the, assuming, built-in HttpProxyMiddleware
Downloader Middleware - set its value in DOWNLOADER_MIDDLEWARES
to None
:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
'IPProxy.IPProxy.spiders.RandomProxy': 600
}
Upvotes: 1