Reputation: 471
Pardon my ugly newb code, I'm learning. I'm pulling movie data from OMDB API, but when I move it to CSV I get UnicodeEncodeError for many films. Likely because actor names have accents, for instance. I want to 1.) Identify which films are problematic, 2.) skip them, and/or 3.) preferably correct the error. What I have currently just passes the whole thing when an error occurs. Looking for a simple fix, since I'm novice.
import csv
import os
import json
import omdb
movie_list = ['A Good Year', 'A Room with a View', 'Anchorman', 'Amélie', 'Annie Hall', 'Before Sunrise']
data_list = []
textdoc = open('textdoc.txt','w')
for w in movie_list:
x = omdb.request(t=w, fullplot=True, tomatoes=True, r='json')
y = x.content
z = json.loads(y)
data_list.append([z["Title"], z["Year"], z["Actors"], z["Awards"], z["Director"], z["Genre"], z["Metascore"], z["Plot"], z["Rated"], z["Runtime"], z["Writer"], z["imdbID"], z["imdbRating"], z["imdbVotes"], z["tomatoRating"], z["tomatoReviews"], z["tomatoFresh"], z["tomatoRotten"], z["tomatoConsensus"], z["tomatoUserMeter"], z["tomatoUserRating"], z["tomatoUserReviews"]])
try:
with open('Films.csv', 'w') as g:
a = csv.writer(g, delimiter=',')
a.writerow(["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"])
a.writerows(data_list)
except UnicodeEncodeError:
print("fail")
Upvotes: 3
Views: 5067
Reputation: 1
The solution that works for me is to add at the beginning of the export procedure:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
Upvotes: -1
Reputation: 178179
If using Python 2, csvwriter
doesn't really support Unicode, but there is an example in the csv
documentation to work around it. An example is in this answer.
If using Python 3, then make the following changes:
y = x.content.decode('utf8')
and
with open('Films.csv', 'w', encoding='utf8',newline='') as g:
With these changes text is decoded to Unicode for processing within the Python script, and encoded back to UTF-8 when written to a file. This is the recommended way to deal with Unicode.
newline=''
is the correct way to open a file for csv
use. See this answer and the csv
docs.
You can remove the try
/except
as well. It just suppresses useful tracebacks.
Upvotes: 0
Reputation: 1044
try out smart_str
from django.utils.encoding import smart_str
data_list.append(map(smart_str, [z['element1'], z['element2']]))
a.write_row(map(smart_str, ["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"]))
a.write_rows(data_list)
Upvotes: 1
Reputation: 1234
Python 2.x:Instead of with open("Films.csv", 'w') as g:
you could try to use codecs in order to open the csv output as UTF-8
encoding.
import codecs
with codecs.open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code
Python 3.x: try opening g
with UTF-8
encoding:
with open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code.
Upvotes: 7