John Greenall
John Greenall

Reputation: 1690

python string encoding issues

Is there a function in python that is equivalent to prefixing a string by 'u'?

Let's say I have a string:

a = 'C\xc3\xa9dric Roger'

and I want to convert it to:

b = u'C\xc3\xa9dric Roger'

so that I can compare it to other unicode objects. How can I do this? My first instinct was to try:

>>>> b = unicode(a)
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

But that seems to be trying to decode the string. Is there a function for casting to unicode without doing any kind of decoding? (Is that what the 'u' prefix does or have I misunderstood?)

Upvotes: 2

Views: 143

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124398

You need to specify an encoding:

unicode(a, 'utf8')

or, using str.decode():

a.decode('utf8')

but do pick the right codec for your input; you clearly have UTF-8 data here but that may not always be the case.

To understand what this does, I urge you to read:

Upvotes: 7

Related Questions