Reputation: 581
I'm not entirely sure what I need to do about this error. I assumed that it had to do with needing to add .encode('utf-8'). But I'm not entirely sure if that's what I need to do, nor where I should apply this.
The error is:
line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)
This is the base of my python script.
import csv
from BeautifulSoup import BeautifulSoup
url = \
'https://dummysite'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'table'})
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)
Upvotes: 12
Views: 14141
Reputation: 135
The issue lies with the csv library in python 2. From the unicodecsv project page
Python 2’s csv module doesn’t easily deal with unicode strings, leading to the dreaded “‘ascii’ codec can’t encode characters in position …” exception.
If you can, just install unicodecsv
pip install unicodecsv
import unicodecsv
writer = unicodecsv.writer(csvfile)
writer.writerow(row)
Upvotes: 1
Reputation: 27714
Python 2.x CSV library is broken. You have three options. In order of complexity:
Edit: See below Use the fixed library https://github.com/jdunck/python-unicodecsv (pip install unicodecsv
). Use as a drop-in replacement - Example:
with open("myfile.csv", 'rb') as my_file:
r = unicodecsv.DictReader(my_file, encoding='utf-8')
Read the CSV manual regarding Unicode: https://docs.python.org/2/library/csv.html (See examples at the bottom)
Manually encode each item as UTF-8:
for cell in row.findAll('td'):
text = cell.text.replace('[','').replace(']','')
list_of_cells.append(text.encode("utf-8"))
Edit, I found python-unicodecsv is also broken when reading UTF-16. It complains about any 0x00
bytes.
Instead, use https://github.com/ryanhiebert/backports.csv, which more closely resembles Python 3 implementation and uses io
module..
Install:
pip install backports.csv
Usage:
from backports import csv
import io
with io.open(filename, encoding='utf-8') as f:
r = csv.reader(f):
Upvotes: 25
Reputation: 38442
I found the easiest option, in addition to Alastair's excellent suggestions, to be using python3 instead of python 2. all it required in my script was to change wb
in the open
statement to simply w
in accordance with Python3's syntax.
Upvotes: 0