monstermac77
monstermac77

Reputation: 330

Python's re.split() not removing all matched characters

This is driving me absolutely nuts. I am positive that the entire date range at the start of the string is being matched by the regex. Yet, when I do re.split, an 8 hangs behind. What's going on here and how can I split on that date range (in some cases it might be at the start and in the middle of the string, hence the split)?

import re
a = "09/05/2018-12/18/2018 Lecture Wednesday 01:30PM - 02:45PM, Room to be Announced"
b = r"([0-9]|\/|-){21}"
print re.split(b, a)

Result

['', '8', ' Lecture Wednesday 01:30PM - 02:45PM, Room to be Announced']

Upvotes: 4

Views: 1767

Answers (1)

Tim Peters
Tim Peters

Reputation: 70705

From the docs for re.split:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

You do have a capturing group, and the last thing it matches is the character 8. That's why 8 is returned.

You can use a non-capturing group instead:

>>> b = r"(?:[0-9]|\/|-){21}"
           ^^ note these two characters added
>>> re.split(b, a)
['', ' Lecture Wednesday 01:30PM - 02:45PM, Room to be Announced']

Or you could put all the choices in a single character class, and not need a group at all:

>>> b = r"[-/0-9]{21}"
>>> re.split(b, a)
['', ' Lecture Wednesday 01:30PM - 02:45PM, Room to be Announced']

Upvotes: 2

Related Questions