GolovDanil
GolovDanil

Reputation: 133

Python 2.7, Requests library, can't get unicode

Documentation for Request library says that requests.get() method returns unicode always. But when I try to know what an encoding was returned, I see "windows-1251". That's a problem. When I try to get requests.get(url).text, there's an error, because current url's content has a Cyrillic symbols.

import requests

url = 'https://www.weblancer.net/jobs/'
r = requests.get(url)
print r.encoding
print r.text

I got something like that:

windows-1251
UnicodeEncodeError: 'ascii' codec can't encode characters in position 256-263: ordinal not in range(128)

Is it a problem of Python 2.7 or there is not a problem at all ? Help me

Upvotes: 0

Views: 2789

Answers (1)

GreenAsJade
GreenAsJade

Reputation: 14685

From the docs:

Requests will automatically decode content from the server. Most unicode charsets are seamlessly decoded.

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers.

requests.get().encoding is telling you the encoding that was used to convert the bitstream from the server into the Unicode text that is in the response.

In your case it is correct: the headers in the response say that the character set is windows-1251

The error you are having is after that. The python you are using is trying to encode the Unicode into ascii to print it, and failing.

You can say print r.text.encode(r.encoding) ... which is the same result as Padraic's suggestion in comments - that is r.content.


Note: requests.get().encoding is an lvar: you can set it to what you want, if it guessed wrongly.

Upvotes: 3

Related Questions