Reputation: 18462
I have a problem with unescapting unicode string. I tried the following, but it doesn't work with unicode chars.
>>> s = ur"\'test\'"
>>> s.decode("string_escape")
"'test'"
>>> s = ur"\'test \u2014\'"
>>> s.decode("string_escape")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 7:
ordinal not in range(128)
Is there a better way to remove the backslashes?
Btw: I need this, because xmlrpclib.ServerProxy escapes the responses.
Edit: Here's an example for my xmlrpc request:
>>import xmlrpclib
>>server = xmlrpclib.ServerProxy("http://ws.audioscrobbler.com/2.0/")
>>xml_data = server.tag.search({'api_key':'...','tag':'80s'})
>>print xml_data
<?xml version=\"1.0\" encoding=\"utf-8\"?>
<lfm status=\"ok\">
<results for=\"80s\" xmlns:opensearch=\"http://a9.com/-/spec/opensearch/1.1/\">
<opensearch:Query role=\"request\" searchTerms=\"80s\" startPage=\"1\" />
...
I think the escapes comes from the xmlrpc server.
Upvotes: 0
Views: 3273
Reputation: 879869
Interestingly, the error you posted does not seem to occur using Python 2.6.4:
In [110]: s = ur"\'test\'"
In [111]: s.decode("string_escape")
Out[111]: "'test'"
In [112]: s = ur"\'test \u2014\'"
In [113]: s.decode("string_escape")
Out[113]: "'test \xe2\x80\x94'"
In [114]: print(s.decode("string_escape"))
'test —'
Upvotes: 0
Reputation: 21055
First, there's "string_escape"
and "unicode_escape"
, either can't decode the string that you have given. The first reads a bytestring escaped as a bytestring, and decodes it as a bytestring. The second reads an unicode string escaped and saved in a bytestring, so it can't read an input unicode objects, at least not ones that do have unicode characters in them.
I believe that the raw string you've given here is wrong, and you actually want s.decode('unicode_escape')
for the real strings coming from your source.
If I'm incorrect, the best you can do is to manually escape any unescaped single quotes with re
, put extra single quotes around it and use ast.literal_eval
.
def substitute(match):
if len(match.group(1)) % 2 == 1:
return match.group()
else:
return ur"%s\%s" % (match.group(1), match.group(2))
ast.literal_eval("'%s'" % re.sub(ur"(\\+)(')", substitute, s))
A third option is that the string needs to be passed to ast.literal_eval
without any additional work on your part. Which of the three depends on what you exactly have as a string.
Another suspicion I have is that it might be a JSON object. You should give an example of the string that you're getting, and where are you getting it from and how.
Upvotes: 2