Reputation: 462
Just looking for some confirmation on this, but it appears that the index/position numbers for regular expressions do not follow the same rules used in the rest of python.
Example:
pattern=re.compile('<HTML>')
pattern.search("<HTML>")
output:
<_sre.SRE_Match object; span=(0, 6), match='<HTML>'>
Why is "span=(0, 6)"?
In python, the string "<HTML>"
is only 6 characters in length and therefore would return an index error when attempting to do something like:
"<HTML>"[6]
File "<stdin>", line 1, in <module>
IndexError: string index out of range
So I'm fairly certain the answer is that this span value for match objects is inherently different than index values for python data structures. While the span value for matched objects starts at 0 for the first character(like with all python data structures) the last character is always endpos-1.
If anyone can confirm my assumption and maybe explain why this difference exists I would greatly appreciate it.
Upvotes: 1
Views: 411
Reputation: 15175
Well a slice (span) in Python is open ended. So "<HTML>and much more"[0:6]
actually returns "<HTML>"
.
Upvotes: 2