Reputation: 5536
I need to access to the original http request that the browser send to the server in web.py
.
E.g., this is the request that Chromium issues when I surf to some page:
$ nc -l 8081
GET / HTTP/1.1
Host: 127.0.0.1:8081
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
I tried to get that from the web.ctx.env
, but that's a dictionary (while I'd prefer the original raw text request) and it's mixed with some other data:
SERVER_SOFTWARE: CherryPy/3.2.0 Server
SCRIPT_NAME:
ACTUAL_SERVER_PROTOCOL: HTTP/1.1
REQUEST_METHOD: GET
PATH_INFO: /
SERVER_PROTOCOL: HTTP/1.1
QUERY_STRING:
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP_USER_AGENT: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22
HTTP_CONNECTION: keep-alive
REMOTE_PORT: 55409
SERVER_NAME: localhost
REMOTE_ADDR: 127.0.0.1
wsgi.url_scheme: http
SERVER_PORT: 8081
wsgi.input: <web.wsgiserver.KnownLengthRFile object at 0x940b16c>
HTTP_HOST: 127.0.0.1:8081
wsgi.multithread: True
REQUEST_URI: /
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
wsgi.version: (1, 0)
wsgi.run_once: False
wsgi.errors: <open file '<stderr>', mode 'w' at 0xb73010d0>
wsgi.multiprocess: False
HTTP_ACCEPT_LANGUAGE: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
HTTP_ACCEPT_ENCODING: gzip,deflate,sdch
This is the code I used to obtain the output above:
#!/usr/bin/env python
import web
urls = ('(.*)', 'urlhandler')
class urlhandler:
def GET(self, url):
txt = ""
for k, v in web.ctx.env.items():
txt += ": ".join([k, str(v)]) + "\n"
return txt
if __name__ == '__main__':
app = web.application(urls, globals())
app.run()
Should I purge this dictionary from unwanted data or is there a straightforward way to get the original request?
Upvotes: 2
Views: 4281
Reputation: 5536
Following Andrey's suggestion I came out with this code. It tries to reconstruct the web request, maybe this is not the best way to get it, but it's the only way I found to do that until now.
This program will display the web request of the requested page (it works for both POST and GET requests):
#!/usr/bin/env python
import web
from urllib import quote
urls = ('(.*)', 'urlhandler')
def adaptHeader(txt):
"""Input: string, header name as it is in web.ctx.env
Output: string, header name according to http protocol.
es: "HTTP_CACHE_CONTROL" => "Cache-Control"
"""
txt = txt.replace('HTTP_', '')
return '-'.join((t[0] + t[1:].lower() for t in txt.split('_')))
def rawRequest(env):
"""Reconstruct and return the web request based on web.ctx.env"""
# url reconstruction
# see http://www.python.org/dev/peps/pep-0333/#url-reconstruction
url = env['wsgi.url_scheme']+'://' # http/https
url += env.get('HTTP_HOST') or (env['SERVER_NAME']+':'+env['SERVER_PORT']) # host + port
url += quote(env.get('SCRIPT_NAME', ''))
url += quote(env.get('PATH_INFO', ''))
url += ('?' + env['QUERY_STRING']) if env.get('QUERY_STRING') else '' # GET querystring
# get/post request
req = ' '.join((env['REQUEST_METHOD'], url, env['SERVER_PROTOCOL'])) + '\n'
# headers
for k, v in env.items():
if k.startswith('HTTP') or k in ('CONTENT_TYPE', 'CONTENT_LENGTH'):
req += adaptHeader(k) + ': ' + str(v) + '\n'
# post data
try:
req += '\n' + env['wsgi.input'].read(int(env['CONTENT_LENGTH']))
except:
pass
return req
class urlhandler:
def GET(self, url):
return rawRequest(web.ctx.env)
def POST(self, url):
return rawRequest(web.ctx.env)
if __name__ == '__main__':
app = web.application(urls, globals())
app.run()
Upvotes: 2
Reputation: 4479
Looking at what you have, you may filter web.ctx.env
by the keys that start with "HTTP_". It would be easier than obtaining and parsing raw request headers.
You may check wsgi spec here http://www.python.org/dev/peps/pep-0333/#environ-variables
HTTP_ Variables Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_"). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.
Upvotes: 1