Reputation: 1
I am quite new to Scrapy / ProxyMesh. My request to the Proxymesh server seems to be working as I see my bandwith consumption on the ProxyMesh website, and the meta.proxy is correct in my logs. However, when I log the response headers in Scrapy, I do not receive the X-Proxymesh-IP that I am supposed to receive. Here is my code. What am I doing wrong?
This is my middleware
class Proxymesh(object):
def __init__(self):
logging.debug('Initialized Proxymesh middleware')
self.proxy_ip = 'http://host:port'
def process_request(self, request, spider):
logging.debug('Processing request through proxy IP: ' + self.proxy_ip)
request.meta['proxy'] = self.proxy_ip
These are my settings in my spider
custom_settings = {
"DOWNLOADER_MIDDLEWARES": {
"projectName.middlewares.proxymesh.Proxymesh" : 1,
}
This is what the response headers look like
['Set-Cookie']:['__cfduid=d88d4e4cb7... HttpOnly']
['Vary']:['User-Agent,Accept-Encoding']
['Server']:['cloudflare-nginx']
['Date']:['Thu, 19 Oct 2017 10...38:10 GMT']
['Cf-Ray']:['3b031b30cbef1565-CDG']
['Content-Type']:['text/html; charset=UTF-8']
Thank you for your help
Upvotes: 0
Views: 544
Reputation: 156
Don't know if this relevant anymore but I'm going to post it here. There's an issue with proxymesh and scrapy or python requests. When connecting to a proxy, a CONNECT request is sent to the proxy service in order to create a tunnel which will forward the actual request. If the request is successful, proxymesh adds the X-Proxymesh-IP in the CONNECT requests's confirmation response. This is header totally missed by scrapy as it only takes into consideration the response headers of the actual request.
This only happens to HTTPS requests because the content of the actual request is encrypted.
References:
https://docs.proxymesh.com/article/74-proxy-server-headers-over-https
Upvotes: 1
Reputation: 21271
Maybe you need to do this too?
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 1,
}
And also in your callback function, are you sure you are printing response.headers
Upvotes: 0