Reputation: 31
I really don't understand the following example found on docs.python.org:
>>>> p = re.compile('x*')
>>>> p.sub('-', 'abxd')
'-a-b-d-'
Why the regex 'x*'
is matching four times?
I thought the output should be: 'ab-'
Upvotes: 3
Views: 129
Reputation: 176
One update about re.sub since Python 3.7.
Empty matches for the pattern are replaced when adjacent to a previous non-empty match.
The result becomes "-a-b--d-" because that "d" is now having an empty match. In the previous versions of python, this empty match is not allowed since it is adjacent to the matching of "x".
Upvotes: 0
Reputation: 239573
*
meta character matches 0 or more times. So,
a bx d
^ ^ -- ^
^
is the position where x*
matches 0 times and --
is the place where x*
matches 1 time. That is why the output is -a-b-d-
.
To get the output ab-d
, you need to use x+
in the regular expression. It means that match one or more times. So, it will match only the following positions
abxd
^
Upvotes: 3