Reputation: 15286
I need to detect character encoding in HTTP responses. To do this I look at the headers, then if it's not set in the content-type header I have to peek at the response and look for a "<meta http-equiv='content-type'>
" header. I'd like to be able to write a function that looks and works something like this:
response = urllib2.urlopen("http://www.example.com/")
encoding = detect_html_encoding(response)
...
page_text = response.read()
However, if I do response.read() in my "detect_html_encoding" method, then the subseuqent response.read() after the call to my function will fail.
Is there an easy way to peek at the response and/or rewind after a read?
Upvotes: 1
Views: 518
Reputation: 881735
def detectit(response):
# try headers &c, then, worst case...:
content = response.read()
response.read = lambda: content
# now detect based on content
The trick of course is ensuring that response.read()
WILL return the same thing again if needed... that's why we assign that lambda
to it if necessary, i.e., if we already needed to extract the content -- that ensures the same content can be extracted again (and again, and again, ...;-).
Upvotes: 4
Reputation: 75427
response.info()
to detect the encodingIf you want to parse the HTML, save the response data:
page_text = response.read()
encoding = detect_html_encoding(response, page_text)
Upvotes: 0