Wang Nick
Wang Nick

Reputation: 495

Replacing unicode in python

here is the result I print out by python:

With \u003cb\u003eall\u003c/b\u003e respect, if we look from one perspective, it is just like looking at ants.

and the data type is

<type 'unicode'>

Is there gonna be a way to replace \u003cb\u003e by ''? I have tried

str.replace("\u003cb\u003e", ''), str.replace("\\u003cb\\u003e", '') and str.replace("<b>", '') but none of them worked

. How can properly replace it by an empty string?

edited:

here is the result of print repr(mystrung):

With \\u003cb\\u003eall\\u003c/b\\u003e respect, if we look from one
perspective, it is just like looking at ants.

Upvotes: 1

Views: 2183

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177674

If you actually want to remove them completely, your second example should have worked. Using Unicode strings is more efficient, though, since an implicit conversion is eliminated:

>>> s=u'With \\u003cb\\u003eall\\u003c/b\\u003e respect, if we look from one perspective, it is just like looking at ants.'
>>> s.replace(u'\\u003cb\\u003e',u'').replace(u'\\u003c/b\\u003e',u'')
u'With all respect, if we look from one perspective, it is just like looking at ants.'

If you'd rather just convert the Unicode escapes, encoding a Unicode string containing only ASCII codepoints with ascii converts it back to a byte string, then decode it with unicode-escape to turn the literal escape codes back to characters:

>>> print(s.encode('ascii').decode('unicode-escape'))
With <b>all</b> respect, if we look from one perspective, it is just like looking at ants.

Upvotes: 2

Related Questions