Jack Daniel
Jack Daniel

Reputation: 2611

How to use Downloader Middleware in Scrapy

I am using scrapy to scrape some web pages. I wrote my customised ProxyMiddleware class in which I implemented my requirement in process_request(self,request,spider) method. Here is my code(copied):

class ProxyMiddleware(scrapy.downloadermiddlewares.httpproxy):
def __init__(self, proxy_ip=''):
    self.proxy_ip = proxy_ip

def process_request(self,request,spider):
    ip = random.choice(self.proxy_list)
    if ip:
        request.meta['proxy'] = ip
    return request

proxy_list = [list of proxies]

Now, I didn't understand how scrapy will consider my implementation instead of default class. After some searching and brainstorming, what I understood is, I need to make changes in settings.py

DOWNLOADER_MIDDLEWARES = {
    'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
    'IPProxy.IPProxy.spiders.RandomProxy': 600
}

For better understanding of my project structure to readers, I added second element in the list with some random value. My project structure is:

enter image description here

My question is,

Upvotes: 5

Views: 8336

Answers (1)

alecxe
alecxe

Reputation: 473933

If you want to disable the, assuming, built-in HttpProxyMiddleware Downloader Middleware - set its value in DOWNLOADER_MIDDLEWARES to None:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
    'IPProxy.IPProxy.spiders.RandomProxy': 600
}

Upvotes: 1

Related Questions