Reputation:
I have made a python program to get the movie/tv show information using the OMDb API http://www.omdbapi.com/
I am getting an error while printing the running years of the tv show. Here's a part of the code where this is happening:
keys = ['Title', 'Year', 'imdbRating', 'Director', 'Actors', 'Genre', 'totalSeasons']
def jsonContent(self):
payload = {'t':self.title}
movie = requests.get(self.url, params = payload)
return movie.json()
def getInfo(self):
data = self.jsonContent()
for key, value in data.items():
if key in keys:
print key.encode('utf-8') + ' : ' + value.encode('utf-8')
For example if I search for How I Met Your Mother, it prints out like this:
totalSeasons : 9
Title : How I Met Your Mother
imdbRating : 8.4
Director : N/A
Actors : Josh Radnor, Jason Segel, Cobie Smulders, Neil Patrick Harris
Year : 2005ΓÇô2014 #problem here
Genre : Comedy, Romance
How can I fix this?
Upvotes: 0
Views: 1118
Reputation: 1123072
You are encoding Unicode text to UTF-8 before printing:
print key.encode('utf-8') + ' : ' + value.encode('utf-8')
Your console or terminal is not configured to interpret UTF-8 however. It is being sent bytes and it is then displaying characters based on a different codec altogether.
Your value
contains a \u2013
or U+2013 EN DASH character, which encodes to UTF-8 as 3 bytes E2 80 93
, which your terminal appears to decode as Windows Codepage 437 instead:
>>> value = u'2005\u20132014'
>>> print value
2005–2014
>>> print value.encode('utf8').decode('cp437')
2005ΓÇô2014
Either re-configure your console or terminal, or set the PYTHONIOENCODING
environment variable to use an error handler:
PYTHONIOENCODING=cp437:replace
The :replace
part will tell Python to encode to cp437 but to use placeholders for characters it can't handle. You'll get a question mark instead:
>>> print value.encode('cp437', 'replace')
2005?2014
Note that I have to encode to CP437 explicitly in all these examples. You don't as Python has detected your configuration and will do this automatically for you. Just stick to printing Unicode directly.
Another alternative is to use the Unicodecode
package to replace non-ASCII characters with close approximations; it'll replace the en-dash with an ASCII dash:
>>> from unidecode import unidecode
>>> value
u'2005\u20132014'
>>> unidecode(value)
'2005-2014'
Upvotes: 3