Kees
Kees

Reputation: 471

UnicodeEncodeError with csv.writer

Pardon my ugly newb code, I'm learning. I'm pulling movie data from OMDB API, but when I move it to CSV I get UnicodeEncodeError for many films. Likely because actor names have accents, for instance. I want to 1.) Identify which films are problematic, 2.) skip them, and/or 3.) preferably correct the error. What I have currently just passes the whole thing when an error occurs. Looking for a simple fix, since I'm novice.

import csv
import os
import json
import omdb

movie_list = ['A Good Year', 'A Room with a View', 'Anchorman', 'Amélie', 'Annie Hall', 'Before Sunrise']

data_list = []

textdoc = open('textdoc.txt','w')

for w in movie_list:
    x = omdb.request(t=w, fullplot=True, tomatoes=True, r='json')
    y = x.content
    z = json.loads(y)
    data_list.append([z["Title"], z["Year"], z["Actors"], z["Awards"], z["Director"], z["Genre"], z["Metascore"], z["Plot"], z["Rated"], z["Runtime"], z["Writer"], z["imdbID"], z["imdbRating"], z["imdbVotes"], z["tomatoRating"], z["tomatoReviews"], z["tomatoFresh"], z["tomatoRotten"], z["tomatoConsensus"], z["tomatoUserMeter"], z["tomatoUserRating"], z["tomatoUserReviews"]])

try:
    with open('Films.csv', 'w') as g:
        a = csv.writer(g, delimiter=',')
        a.writerow(["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"])
        a.writerows(data_list)
except UnicodeEncodeError:
    print("fail")

Upvotes: 3

Views: 5067

Answers (4)

Ether
Ether

Reputation: 1

The solution that works for me is to add at the beginning of the export procedure:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

Upvotes: -1

Mark Tolonen
Mark Tolonen

Reputation: 178179

If using Python 2, csvwriter doesn't really support Unicode, but there is an example in the csv documentation to work around it. An example is in this answer.

If using Python 3, then make the following changes:

y = x.content.decode('utf8')

and

with open('Films.csv', 'w', encoding='utf8',newline='') as g:

With these changes text is decoded to Unicode for processing within the Python script, and encoded back to UTF-8 when written to a file. This is the recommended way to deal with Unicode.

newline='' is the correct way to open a file for csv use. See this answer and the csv docs.

You can remove the try/except as well. It just suppresses useful tracebacks.

Upvotes: 0

minocha
minocha

Reputation: 1044

try out smart_str

from django.utils.encoding import smart_str
data_list.append(map(smart_str, [z['element1'], z['element2']]))
a.write_row(map(smart_str, ["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"]))
a.write_rows(data_list)

Upvotes: 1

Cory Shay
Cory Shay

Reputation: 1234

Python 2.x:Instead of with open("Films.csv", 'w') as g: you could try to use codecs in order to open the csv output as UTF-8 encoding.

import codecs
with codecs.open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code

Python 3.x: try opening g with UTF-8 encoding:

with open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code.

Upvotes: 7

Related Questions