Reputation: 16526
trying to pipe a list that I've scraped from http://www.ropeofsilicon.com/roger-eberts-great-movies-list/ through the API at http://www.omdbapi.com/ to grab their IMDB ids.
creating logging for movies that I can and can't find as follows:
import requests
OMDBPath = "http://www.omdbapi.com/"
movieFile = open("movies.txt")
foundLog = open("log_found.txt", 'w')
notFoundLog = open("log_not_found.txt", 'w')
####
for line in movieFile:
name = line.split('(')[0].decode('utf8')
print name
year = False
if line.find('(') != -1:
year = line[line.find('(')+1 : line.find(')')].decode('utf8')
OMDBQuery = {'t': name, 'y': year}
else:
OMDBQuery = {'t': name}
req = requests.get(OMDBPath, params=OMDBQuery)
if req.json()[u'Response'] == "False":
if year:
notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
else:
notFoundLog.write("Couldn't find " + name + "\n")
# else:
# print req.json()
# foundLog.write(req.text.decode('utf8').encode('latin1') + ",")
movieFile.close()
foundLog.close()
notFoundLog.close()
Been reading a lot about unicode encoding and decoding, looks like this is happening because I'm not encoding the file in the right manner? Not sure what's wrong here, getting an issue when I get to "Caché":
Caché
Traceback (most recent call last):
File "app.py", line 34, in <module>
notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 18: ordinal not in range(128)
Upvotes: 0
Views: 45
Reputation: 94951
Here is a working solution that relies on the codecs
module to provide transparent encoding/decoding to/from utf-8
for the various files you open:
import requests
import codecs
OMDBPath = "http://www.omdbapi.com/"
with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
for line in movieFile:
name = line.split('(')[0]
print(name)
year = False
if line.find('(') != -1:
year = line[line.find('(')+1 : line.find(')')]
OMDBQuery = {'t': name, 'y': year}
else:
OMDBQuery = {'t': name}
req = requests.get(OMDBPath, params=OMDBQuery)
if req.json()[u'Response'] == "False":
if year:
notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
else:
notFoundLog.write(u"Couldn't find {}\n".format(name))
#else:
#print(req.json())
#foundLog.write(u"{},".format(req.text))
Note that the use of the codecs
module is only required in Python 2.x. In Python 3.x, the built-in open
function should handle this properly by default.
Upvotes: 1