Reputation: 164
I've written a script in scrapy to make a request pass through a custom middleware in order for that request to be proxied. However, the script doesn't seem to have any effect of that middleware. When I print response.meta
, I get {'download_timeout': 180.0, 'download_slot': 'httpbin.org', 'download_latency': 0.9680554866790771}
which clearly indicates that my request is not passing through the custom middleware. I've used CrawlerProcess
to run the script.
spider
contains:
import scrapy
from scrapy.crawler import CrawlerProcess
class ProxySpider(scrapy.Spider):
name = "proxiedscript"
start_urls = ["https://httpbin.org/ip"]
def parse(self,response):
print(response.meta)
print(response.text)
if __name__ == "__main__":
c = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
c.crawl(ProxySpider)
c.start()
middleware
contains:
class ProxiesMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = 'http://206.189.25.70:3128'
return request
Change that I've made in settings.py
:
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
'proxyspider.middleware.ProxiesMiddleware': 100,
}
The following image shows the project hierarchy:
What possible change should I bring about to make a proxied request through middleware?
Upvotes: 1
Views: 263
Reputation: 3561
You need to check log output of this line: [scrapy.middleware] INFO: Enabled downloader middlewares:
for list of active downloader middlewares. Your middleware should be in the list if it's active.
As far as I remember usage of scrapy.contrib
modules deprecated now.
Scrapy: No module named 'scrapy.contrib'
Your code with custom middleware is nearly ready for usage of scrapy command line tool
scrapy crawl proxiedscript
.
Hovewer Your crawler process needs toread_projects_settings
first if need to launch scrapy application as script.
or define DOWNLOADER_MIDDLEWARES
setting as argument for CrawlerProcess
:
c = CrawlerProcess({
'USER_AGENT':'Mozilla/5.0',
'DOWNLOADER_MIDDLEWARES':{
#'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,#deprecated in scrapy 1.6
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware':110, #enabled by default
'proxyspider.middleware.ProxiesMiddleware': 100,
},
})
Upvotes: 1
Reputation: 51914
perhaps return None
instead of a Request
? Returning a Request
prevents any other downloader middlewares from running.
Upvotes: 1