Python Regular Expression Index Numbers

Question

Just looking for some confirmation on this, but it appears that the index/position numbers for regular expressions do not follow the same rules used in the rest of python.

Example:

pattern=re.compile('')
pattern.search("")

output:

<_sre.SRE_Match object; span=(0, 6), match=''>

Why is "span=(0, 6)"?

In python, the string "" is only 6 characters in length and therefore would return an index error when attempting to do something like:

""[6]
File "", line 1, in 
IndexError: string index out of range

So I'm fairly certain the answer is that this span value for match objects is inherently different than index values for python data structures. While the span value for matched objects starts at 0 for the first character(like with all python data structures) the last character is always endpos-1.

If anyone can confirm my assumption and maybe explain why this difference exists I would greatly appreciate it.

RedX · Accepted Answer

Well a slice (span) in Python is open ended. So "and much more"[0:6] actually returns "".

Python Regular Expression Index Numbers

Answers (1)

Related Questions