govindreddy
govindreddy

Reputation: 21

A script in python 2.7 urllib2 and json raises unicode error

import json
import urllib2
url='http://search.twitter.com/search.json?q=python'
open=urllib2.urlopen(url)
response=open.read().encode('utf8')
data=json.loads(response)
results=data['results']
for result in results:
  print result['from_user'] + ': ' + result['text'] + '\n'

gives the error UnicodeEncodeError: 'charmap' codec can't encode characters in position 16-24: character maps to <undefined>.

Anyone have a solution for this?

Upvotes: 1

Views: 511

Answers (1)

udoprog
udoprog

Reputation: 1865

What you are looking to do is probably to decode and not encode the response.

A very short explanation why is that the http server doesn't know how to send unicode characters, just byte. Hence it uses an encoding like utf-8 to translate these characters into bytes. When you receive a response from the server you receive this chunk of bytes, and if you want to translate it back into a list of unicode characters (basically a unicode object in python) you have to decode them.

What adds more to the confusion is that the lower spectrum of ascii characters (codepoint < 127) are exactly the same as the lower unicode codepoints when using utf-8. A situation where a unicode codepoint is both encoded the same and fits within the range that can be represented in a single byte for each character.

Hope this is helpful.

Upvotes: 3

Related Questions