Samyak
Samyak

Reputation: 400

Regular expression in python doesn't seem to be working like I expect

My code doesn't seem to be working like it's supposed to:

x = "engniu4nwi5u"
print re.sub(r"\D(\d)\D", r"\1abc", x)

My desired output is: engniuabcnwiabcu
But the output actually given is: engni4abcw5abc

Upvotes: 2

Views: 75

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

If you plan to check also the beginning and end of string, you need to add ^ and $ to the regex:

(\D|^)\d(?=$|\D)

And replace with \1abc.

See demo

Sample code on IDEONE:

import re
p = re.compile(ur'(\D|^)\d(?=$|\D)')
test_str = u"1engniu4nwi5u"
subst = u"\1abc"
print re.sub(p, subst, test_str)

Upvotes: 0

moliware
moliware

Reputation: 10278

Based on your regexp:

>>> re.sub("(\D)\d", r"\1abc", x)
'engniuabcnwiabcu'

Although I would do this instead:

>>> re.sub("\d", "abc", x)
'engniuabcnwiabcu'

Upvotes: 0

nu11p01n73R
nu11p01n73R

Reputation: 26667

You are grouping the wrong characters it must be written as

>>> x = "engniu4nwi5u"
>>> re.sub(r"(\D)\d(\D)", r"\1abc\2", x)
'engniuabcnwiabcu'
  • (\D) Matches a non digit and captures it in \1
  • \d Matches the digit
  • (\D) Matches the following digit. captures in \2

How does it matches?

engniu4nwi5u
     |
    \D => \1

engniu4nwi5u
      |
     \d

engniu4nwi5u
       |
      \D => \2

Another Solution

You can also use look arounds to perform the same as

>>> x = "engniu4nwi5u"
>>> re.sub(r"(?<=\D)\d(?=\D)", r"abc", x)
'engniuabcnwiabcu'
  • (?<=\D) Look behind assertion. Checks if the digit is presceded by a non digit. But not caputred
  • \d Matches the digit
  • (?=\D) Look ahead assertion. Checks if the digit is followed by the non digit. Also not captured.

Upvotes: 2

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

This is because you replaced the wrong part:

Let's consider the first match. \D\d\D matches the following:

engniu4nwi5u
     ^^^

4 is captured as \1. Then you replace the whole match with: \1abc, which becomes 4abc.

You have a couple solutions here:

  • Capture what you want to keep: (\D)\d(\D) and replace it with \1abc\2
  • Use lookaheads: (?<=\D)\d(?=\D) and replace this with abc

Upvotes: 1

Related Questions