Rahul
Rahul

Reputation: 2748

How does $ appear at end?

I am trying to understand pos,endpos and slice which using RegexObject in Python.

My Code is as following:

>>> import re
>>> pat=re.compile(r'^abcd')

# Starting search from index 2.
>>> print(pat.match('..abcd',2))   
None

# Slicing gives a new string "abcd" hence a match for ^ is found.
>>> pat.match('..abcd'[2:]) 
<_sre.SRE_Match object; span=(0, 4), match='abcd'>

>>> pat=re.compile(r'abcd$')

# How does $ appear at end ?
>>> pat.match('abcd..',0,4)
<_sre.SRE_Match object; span=(0, 4), match='abcd'> 

# Slicing gives a new string "abcd" hence a match for ^ is found.    
>>> pat.match('abcd..'[:4])
<_sre.SRE_Match object; span=(0, 4), match='abcd'>

My question: As string abcd.. is not sliced in >>> pat.match('abcd..',0,4)

How does $ appear at endpos ?

Upvotes: 1

Views: 85

Answers (1)

user2357112
user2357112

Reputation: 282026

The match method docs:

The optional pos and endpos parameters have the same meaning as for the search() method.

refer to the search method, which says

The optional parameter endpos limits how far the string will be searched; it will be as if the string is endpos characters long, so only the characters from pos to endpos - 1 will be searched for a match. If endpos is less than pos, no match will be found; otherwise, if rx is a compiled regular expression object, rx.search(string, 0, 50) is equivalent to rx.search(string[:50], 0).

Providing an endpos of 4 is equivalent to slicing the string to a length of 4, so endpos is considered the new end of the string, and $ matches there. This is a bizarre contrast to the interaction of pos and ^, which explicitly does not work that way:

the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.

Upvotes: 2

Related Questions