7a_
7a_

Reputation: 41

Python raw http request retrieval issues via urllib2

I am using Python 2.6.5 and I am trying to capture the raw http request sent via HTTP, this works fine except when I add a proxy handler into the mix so the situation is as follows:

The following questions are close but do not solve my problem:

This is what I am doing:

class MyHTTPConnection(httplib.HTTPConnection):
    def send(self, s):
            global RawRequest
            RawRequest = s  # Saving to global variable for Requester class to see
            httplib.HTTPConnection.send(self, s)

class MyHTTPHandler(urllib2.HTTPHandler):
    def http_open(self, req):
            return self.do_open(MyHTTPConnection, req)

class MyHTTPSConnection(httplib.HTTPSConnection):
    def send(self, s):
            global RawRequest
            RawRequest = s  # Saving to global variable for Requester class to see
            httplib.HTTPSConnection.send(self, s)

class MyHTTPSHandler(urllib2.HTTPSHandler):
    def https_open(self, req):
            return self.do_open(MyHTTPSConnection, req)

Requester class:

global RawRequest
ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'http://127.0.0.1:8080' }
# If ProxyConf = { 'http':'http://127.0.0.1:8080' }, then Raw HTTPS request captured BUT the proxy does not see the HTTPS request!
# Also tried with similar results:     ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'https://127.0.0.1:8080' }
ProxyHandler = urllib2.ProxyHandler(ProxyConf)
urllib2.install_opener(urllib2.build_opener(ProxyHandler, MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
# This is the problem: global RawRequest NOT updated!?
urllib2.Request('https://accounts.google.com', None) 

BUT, if I remove the ProxyHandler it works!:

global RawRequest
urllib2.install_opener(urllib2.build_opener(MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
urllib2.Request('https://accounts.google.com', None) # global RawRequest updated

How can I add the ProxyHandler into the mix while keeping access to the RawRequest?

Thank you in advance.

Upvotes: 3

Views: 1105

Answers (1)

7a_
7a_

Reputation: 41

Answering my own question: It seems a bug in the underlying libraries, making RawRequest a list solves the problem: The HTTP Raw request is the first item. The custom HTTPS class is called several times, the last of which is blank. The fact that the custom HTTP class is only called once suggests this is a bug in python but the list solution gets around it

RawRequest = s

just needs to be changed to:

RawRequest.append(s)

with a previous initialisation of RawRequest = [] and retrieval of raw request via RawRequest[0] (first element of the list)

Upvotes: 1

Related Questions