Reputation: 2748
I am trying to understand pos,endpos and slice which using RegexObject
in Python.
My Code is as following:
>>> import re
>>> pat=re.compile(r'^abcd')
# Starting search from index 2.
>>> print(pat.match('..abcd',2))
None
# Slicing gives a new string "abcd" hence a match for ^ is found.
>>> pat.match('..abcd'[2:])
<_sre.SRE_Match object; span=(0, 4), match='abcd'>
>>> pat=re.compile(r'abcd$')
# How does $ appear at end ?
>>> pat.match('abcd..',0,4)
<_sre.SRE_Match object; span=(0, 4), match='abcd'>
# Slicing gives a new string "abcd" hence a match for ^ is found.
>>> pat.match('abcd..'[:4])
<_sre.SRE_Match object; span=(0, 4), match='abcd'>
My question: As string abcd..
is not sliced in >>> pat.match('abcd..',0,4)
How does $
appear at endpos ?
Upvotes: 1
Views: 85
Reputation: 282026
The match
method docs:
The optional pos and endpos parameters have the same meaning as for the
search()
method.
refer to the search
method, which says
The optional parameter endpos limits how far the string will be searched; it will be as if the string is endpos characters long, so only the characters from pos to
endpos - 1
will be searched for a match. If endpos is less than pos, no match will be found; otherwise, if rx is a compiled regular expression object,rx.search(string, 0, 50)
is equivalent torx.search(string[:50], 0)
.
Providing an endpos
of 4 is equivalent to slicing the string to a length of 4, so endpos
is considered the new end of the string, and $
matches there. This is a bizarre contrast to the interaction of pos
and ^
, which explicitly does not work that way:
the
'^'
pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.
Upvotes: 2