Geekluca
Geekluca

Reputation: 31

Python empty matches replaced

I really don't understand the following example found on docs.python.org:

>>>> p = re.compile('x*')

>>>> p.sub('-', 'abxd')

'-a-b-d-'

Why the regex 'x*' is matching four times?

I thought the output should be: 'ab-'

Upvotes: 3

Views: 129

Answers (2)

Peipei
Peipei

Reputation: 176

One update about re.sub since Python 3.7.

Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

The result becomes "-a-b--d-" because that "d" is now having an empty match. In the previous versions of python, this empty match is not allowed since it is adjacent to the matching of "x".

Upvotes: 0

thefourtheye
thefourtheye

Reputation: 239573

* meta character matches 0 or more times. So,

 a bx d
^ ^ -- ^

^ is the position where x* matches 0 times and -- is the place where x* matches 1 time. That is why the output is -a-b-d-.

To get the output ab-d, you need to use x+ in the regular expression. It means that match one or more times. So, it will match only the following positions

abxd
  ^

Upvotes: 3

Related Questions