Reputation: 1689
I'm using BeautifulSoup, and I get back a string like this:
u'Dassault Myst\xe8re'
It's a unicode, but what I want is to make it look like:
'Dassault Mystère'
I have tried
name = name.encode('utf-8'), decode(), unicode()
The error I keep getting is:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'
My default encoding seems to be 'ascii': sys.getdefaultencoding() returns 'ascii' even though I have:
#!/usr/bin/env python
# encoding: utf-8
At the top of the file.
Hoping to solve this recurring Unicode issue once and for all!
Thanks
Upvotes: 4
Views: 5432
Reputation: 3742
I do not know how and where you get this message, but look at this exmple:
$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = u'Dassault Myst\xe8re'
>>> txt
u'Dassault Myst\xe8re'
>>> print txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 13:
ordinal not in range(128)
>>> ^D
$ export LANG=en_US.UTF-8
$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = u'Dassault Myst\xe8re'
>>> txt
u'Dassault Myst\xe8re'
>>> print txt
Dassault Mystère
>>>^D
So as you can see if you have a console as ASCII then during print, there is a conversion from unicode to ascii, and if there is character outside ASCII scope - exception is thrown.
But if console can accept unicode, then everything is correctly displayed.
Upvotes: 1