Reputation: 3
I am trying to remove the pattern using following code
x = "mr<u+092d><u+093e><u+0935><u+0941><u+0915>"
pattern = '[<u+0-9de>]'
re.sub(pattern,'', x)
Output
mr
This output is actually correct for the given sample string but when I am running this code to the corpus, it removing all the occurrences of 'de' as well as digits etc. I want these things are replaced only when < > is used.
Upvotes: 0
Views: 950
Reputation: 54148
You need to put the <>
outside, as the structure will always be
<
u\+
[0-9a-f]{4}
as from Unicode definition>
pattern = '<u\+[0-9a-f]{4}>'
re.sub(pattern,'', x)
REGEX DEMO
★ CODE DEMO
Upvotes: 1