Reputation: 1690
Is there a function in python that is equivalent to prefixing a string by 'u'?
Let's say I have a string:
a = 'C\xc3\xa9dric Roger'
and I want to convert it to:
b = u'C\xc3\xa9dric Roger'
so that I can compare it to other unicode objects. How can I do this? My first instinct was to try:
>>>> b = unicode(a)
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
But that seems to be trying to decode the string. Is there a function for casting to unicode without doing any kind of decoding? (Is that what the 'u' prefix does or have I misunderstood?)
Upvotes: 2
Views: 143
Reputation: 1124398
You need to specify an encoding:
unicode(a, 'utf8')
or, using str.decode()
:
a.decode('utf8')
but do pick the right codec for your input; you clearly have UTF-8 data here but that may not always be the case.
To understand what this does, I urge you to read:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
Pragmatic Unicode by Ned Batchelder
Upvotes: 7