xavierds
xavierds

Reputation: 1

Not receiving headers Scrapy ProxyMesh

I am quite new to Scrapy / ProxyMesh. My request to the Proxymesh server seems to be working as I see my bandwith consumption on the ProxyMesh website, and the meta.proxy is correct in my logs. However, when I log the response headers in Scrapy, I do not receive the X-Proxymesh-IP that I am supposed to receive. Here is my code. What am I doing wrong?

This is my middleware

class Proxymesh(object):

 def __init__(self):

    logging.debug('Initialized Proxymesh middleware')

    self.proxy_ip = 'http://host:port'

 def process_request(self, request, spider):

    logging.debug('Processing request through proxy IP: ' + self.proxy_ip)

    request.meta['proxy'] = self.proxy_ip

These are my settings in my spider

custom_settings = {
    "DOWNLOADER_MIDDLEWARES": {
        "projectName.middlewares.proxymesh.Proxymesh" : 1,
     }

This is what the response headers look like

['Set-Cookie']:['__cfduid=d88d4e4cb7... HttpOnly']
['Vary']:['User-Agent,Accept-Encoding']
['Server']:['cloudflare-nginx']
['Date']:['Thu, 19 Oct 2017 10...38:10 GMT']
['Cf-Ray']:['3b031b30cbef1565-CDG']
['Content-Type']:['text/html; charset=UTF-8']

Thank you for your help

Upvotes: 0

Views: 544

Answers (2)

hokedo
hokedo

Reputation: 156

Don't know if this relevant anymore but I'm going to post it here. There's an issue with proxymesh and scrapy or python requests. When connecting to a proxy, a CONNECT request is sent to the proxy service in order to create a tunnel which will forward the actual request. If the request is successful, proxymesh adds the X-Proxymesh-IP in the CONNECT requests's confirmation response. This is header totally missed by scrapy as it only takes into consideration the response headers of the actual request.

This only happens to HTTPS requests because the content of the actual request is encrypted.

References:

https://docs.proxymesh.com/article/74-proxy-server-headers-over-https

https://bugs.python.org/issue24964?fbclid=IwAR1c88hpLu2OdmEXlwfZfb2n8lMIqT8JvjLeO7pzsvFEiZBVlfJNpYZ4aFk

https://github.com/requests/requests/issues/3061?fbclid=IwAR34XDJa7dJqNpH33LRlvpoRHpaZJhVl75zXfFkEuBa7IjOVCoIxecW0zhw

Upvotes: 1

Umair Ayub
Umair Ayub

Reputation: 21271

Maybe you need to do this too?

DOWNLOADER_MIDDLEWARES = {
     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 1,
}

And also in your callback function, are you sure you are printing response.headers

Upvotes: 0

Related Questions