Jason Christa
Jason Christa

Reputation: 12508

Python unicode woes

What is the correct way to convert '\xbb' into a unicode string? I have tried the following and only get UnicodeDecodeError:

unicode('\xbb', 'utf-8')

'\xbb'.decode('utf-8')

Upvotes: 0

Views: 1018

Answers (3)

Bernhard
Bernhard

Reputation: 8851

Not sure what you are trying to do. But in Python3 all strings are unicode per default. In Python2.X you have to use u'my unicode string \xbb' (or double, tripple quoted) to get unicode strings. When you want to print unicode strings you have to encode them in character set that is supported on the output device, eg. the terminal. u'my unicode string \xbb'.endoce('iso-8859-1') for instance.

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799310

Since it comes from Word it's probably CP1252.

>>> print '\xbb'.decode('cp1252')
»

Upvotes: 8

Ioan Alexandru Cucu
Ioan Alexandru Cucu

Reputation: 12279

It looks to be Latin-1 encoded. You should use:

unicode('\xbb', 'Latin-1')

Upvotes: 1

Related Questions