Milliways
Milliways

Reputation: 1275

Python BeautifulSoup or CSV encoding issue with &nbsp

I was looking for conversion of an HTML table to CSV format, and came across the following, which looked promising (as I am also trying to learn Python) https://stackoverflow.com/a/16697784/838253

Unfortunately, it doesn't work on my samples, and I encounter error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 753: ordinal not in range(128)

This seems to be the result of BeautifulSoup stripped_strings conversion of nonbreaking spaces   into u'\xa0' This looks like perfectly normal Unicode (although converting multiple   into a single `u'\xa0' seems a bit off)

The error seems to come from the csv module. Why can't this handle standard Unicode, and what is the best way of handling this?

Upvotes: 0

Views: 869

Answers (1)

oefe
oefe

Reputation: 19916

In Python 2.7, the csv module doesn't support unicode, see the note at the beginning of the documentation.

You can use UnicodeWriter from the examples to write csv data with Unicode.

Upvotes: 1

Related Questions