Rajendra Nayal
Rajendra Nayal

Reputation: 3

Removing specific pattern from a string using regex in python

I am trying to remove the pattern using following code

x = "mr<u+092d><u+093e><u+0935><u+0941><u+0915>" 
pattern = '[<u+0-9de>]'
re.sub(pattern,'', x)

Output

mr

This output is actually correct for the given sample string but when I am running this code to the corpus, it removing all the occurrences of 'de' as well as digits etc. I want these things are replaced only when < > is used.

Upvotes: 0

Views: 950

Answers (1)

azro
azro

Reputation: 54148

You need to put the <> outside, as the structure will always be

  • start with <
  • following by u\+
  • 4 chars in hexa [0-9a-f]{4} as from Unicode definition
  • end with >
pattern = '<u\+[0-9a-f]{4}>'
re.sub(pattern,'', x)

                                  REGEX DEMOCODE DEMO

Upvotes: 1

Related Questions