Reputation: 1832
I am trying to remove the hexadecimal characters \xef\xbb\xbf
from my string however I am getting the following error.
Not quite sure how to resolve this.
>>> x = u'\xef\xbb\xbfHello'
>>> x
u'\xef\xbb\xbfHello'
>>> type(x)
<type 'unicode'>
>>> print x
Hello
>>> print x.replace('\xef\xbb\xbf', '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
>>>
Upvotes: 2
Views: 4566
Reputation: 177901
The real problem was that your Unicode string was incorrectly decoded in the first place. Those characters are a UTF-8 byte order mark (BOM) character mis-decoded as (likely) latin-1 or cp1252.
Ideally, fix how they were decoded, but you can reverse the error by re-encoding as latin1 and decoding correctly:
>>> x = u'\xef\xbb\xbfHello'
>>> x.encode('latin1').decode('utf8') # decode correctly, U+FEFF is a BOM.
u'\ufeffHello'
>>> x.encode('latin1').decode('utf-8-sig') # decode and handle BOM.
u'Hello'
Upvotes: 0
Reputation: 102
Try to use either the decode
or unicode
functions, like so:
x.decode('utf-8')
or
unicode(string, 'utf-8')
Source: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1
Upvotes: 0
Reputation: 17506
You need to replace the unicode object, otherwise Python2 will to attempt to encode x
with the ascii codec to search for the a str
in it.
>>> x = u'\xef\xbb\xbfHello'
>>> x
u'\xef\xbb\xbfHello'
>>> print(x.replace(u'\xef\xbb\xbf',u''))
Hello
This only holds for Python2. In Python3 both versions will work.
Upvotes: 3