Reputation: 85
Writing a simple regex to find dates and times within strings. There's a small issue with identifying time-items when there's specific dates in the sting. Here's the regex:
TIME_REGEX = "([0-1][0-9]|2[0-3])[:\-\_]?([0-5][0-9])[:\-\_]?([0-5][0-9])"
The issue is that I need to accept time-values without anything between the numbers, hence the two "[:-_]?" parts. However, the regex matches even if the two are different from each other. So this will also match the date "2011-07-30" as being the time 20:11:07.
Can I change the regex so both items inbetween the numbers are the same, so it matches "201107" and "20-11-07", but not "2011-07" or "20:11-07"?
Upvotes: 1
Views: 103
Reputation: 6526
I suggest you to match the first intermediate character into a group, and use the result of this group to match the second character, as follows. You just have to retrieve the correct groups at the end:
import re
times = ['20-11-07', '2011-07', '20-1107', '201107', '20:11-07', '20-10:07', '20:11:07']
TIME_REGEX = r'([0-1][0-9]|2[0-3])([:\-\_]*)([0-5][0-9])(\2)([0-5][0-9])'
for time in times:
m = re.search(TIME_REGEX, time)
if m:
print(time, "matches with following groups:", m.group(1), m.group(3), m.group(5))
else:
print(time, "does not match")
# 20-11-07 matches with following groups: 20 11 07
# 2011-07 does not match
# 20-1107 does not match
# 201107 matches with following groups: 20 11 07
# 20:11-07 does not match
# 20-10:07 does not match
# 20:11:07 matches with following groups: 20 11 07
Upvotes: 1
Reputation: 1246
You can store the delimiter in a group and reuse it:
TIME_REGEX = "([0-1][0-9]|2[0-3])(?P<sep>[:\-\_]?)([0-5][0-9])(?P=sep)([0-5][0-9])"
Here, (?P<sep>...)
stores the content of this group under the name sep
, which we ruse with (?P+<sep>)
. This way, both items always have to be equal.
Example:
for test in ['201107', '20-11-07', '20-11:07']:
match = re.match(TIME_REGEX, test)
if match:
print test, match.group(1, 3, 4), "delimiter: '{}'".format(match.group('sep'))
yields:
201107 ('20', '11', '07') delimiter: ''
20-11-07 ('20', '11', '07') delimiter: '-'
Upvotes: 1