Reputation: 33
I am experimenting with the Twitter API for Python and have run into a character encoding/decoding issue; when I am collecting tweets for a user (@BBCWorld in this instance), if there is special punctuation I receive the following error:
286952044814794753 : Traceback (most recent call last):
File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u201c' in position 0: character maps to <undefined>
Note: The long number at the start is the ID of the tweet causing the error.
The specific character that is causing this problem is an angular (opening) double quotation mark (like those used in MS-Word). Is there a way to display such punctuation in a compatible form? Ideally I want to sanitise tweets to overcome this kind of error by use of replacement, therefore maintaining context, rather that omitting characters.
This is the core of the code:
tweets=api.GetUserTimeline('BBCWorld')
try:
for tweet in tweets:
print tweet.id, ": ", (tweet.text)
except UnicodeEncodeError as uee:
print uee
Thanks for any pointers,
Milutin
Upvotes: 3
Views: 2442
Reputation: 13432
This problem does not seem to be an issue of python-twitter or python for that matter - it's a problem with Windows cmd.
If you try this under a suitable Unix terminal, this is what you get:
>>> import twitter
>>> api = twitter.Api()
>>> print api.GetStatus('286952044814794753').text
“How do you change mindsets at a societal level, in a country of 1.2bn people?” - Viewpoints from India http://t.co/RiP4t71q #Delhigangrape
Take a look at this question for a discussion of how to deal with this under Windows: Unicode not printing correctly to cp850 (cp437), play card suits
My best bet for you would be to change your console font and codepage to a unicode compliant, as outlined here: https://stackoverflow.com/a/4234515/679897 or here: http://www.velocityreviews.com/forums/t717717-python-unicode-and-windows-cmd-exe.html
Upvotes: 3