Reputation: 4077
I have a string as follows:
str1 = "heylisten\uff08there is something\uff09to say \uffa9"
I need to replace the unicode values detected by my regex expression with spaces on either sides.
Desired output string:
out = "heylisten \uff08 there is something \uff09 to say \uffa9 "
I have used an re.findall to get all the matches and then replace them. It looks like:
p1 = re.findall(r'\uff[0-9a-e][0-9]', str1, flags = re.U)
out = str1
for item in p1:
print item
print out
out= re.sub(item, r" " + item + r" ", out)
And this outputs:
'heylisten\\ uff08 there is something\\ uff09 to say \\ uffa9 '
What is wrong with the above that it prints an extra "\" and also separates it from uff
? I even tried with re.search
but it seems to only separate \uff08
. Is there a better way?
Upvotes: 0
Views: 229
Reputation: 799580
I have a string as follows:
str1 = "heylisten\uff08there is something\uff09to say \uffa9"
I need to replace the unicode values ...
You don't have any unicode values. You have a bytestring.
str1 = u"heylisten\uff08there is something\uff09to say \uffa9"
...
p1 = re.sub(ur'([\uff00-\uffe9])', r' \1 ', str1)
Upvotes: 1
Reputation: 67998
print re.sub(r"(\\uff[0-9a-e][0-9])", r" \1 ", x)
You can directly use this re.sub
. See demo.
http://regex101.com/r/sU3fA2/67
import re
p = re.compile(ur'(\\uff[0-9a-e][0-9])', re.UNICODE)
test_str = u"heylisten\uff08there is something\uff09to say \uffa9"
subst = u" \1 "
result = re.sub(p, subst, test_str)
Output:
heylisten \uff08 there is something \uff09 to say \uffa9
Upvotes: 1