Reputation: 96
>>> test
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> print test
"Hello," he said.
"I am nine years oldâ"
>>> print test2
"Hello," he\u200b said\u200f\u200e.
"I\u200b am\u200b nine years old"
So how would I convert from test2 to test (i.e. so that unicode characters are printed)? .decode('utf-8')
doesn't do it.
Upvotes: 3
Views: 3090
Reputation: 369224
You can use unicode-escape
encoding to decode '\\u200b'
to u'\u200b'
.
>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> test2.decode('unicode-escape')
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"'
>>> print test2.decode('unicode-escape')
"Hello," he said.
"I am nine years old"
Note: But even with that, test2
cannot be decoded to match exactly test1
because there's a u'\xe2'
in test1
just before the closing quote ("
).
>>> test1 == test2.decode('unicode-escape')
False
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape')
True
Upvotes: 5