Is it possible to peek at the data in a urllib2 response?

Question

I need to detect character encoding in HTTP responses. To do this I look at the headers, then if it's not set in the content-type header I have to peek at the response and look for a "" header. I'd like to be able to write a function that looks and works something like this:

response = urllib2.urlopen("http://www.example.com/")
encoding = detect_html_encoding(response)
...
page_text = response.read()

However, if I do response.read() in my "detect_html_encoding" method, then the subseuqent response.read() after the call to my function will fail.

Is there an easy way to peek at the response and/or rewind after a read?

Alex Martelli · Accepted Answer

def detectit(response):
   # try headers &c, then, worst case...:
   content = response.read()
   response.read = lambda: content
   # now detect based on content

The trick of course is ensuring that response.read() WILL return the same thing again if needed... that's why we assign that lambda to it if necessary, i.e., if we already needed to extract the content -- that ensures the same content can be extracted again (and again, and again, ...;-).

Is it possible to peek at the data in a urllib2 response?

Answers (2)

Related Questions