Erik
Erik

Reputation: 85

Regex: (date/time) same item between each part

Writing a simple regex to find dates and times within strings. There's a small issue with identifying time-items when there's specific dates in the sting. Here's the regex:

TIME_REGEX = "([0-1][0-9]|2[0-3])[:\-\_]?([0-5][0-9])[:\-\_]?([0-5][0-9])"

The issue is that I need to accept time-values without anything between the numbers, hence the two "[:-_]?" parts. However, the regex matches even if the two are different from each other. So this will also match the date "2011-07-30" as being the time 20:11:07.

Can I change the regex so both items inbetween the numbers are the same, so it matches "201107" and "20-11-07", but not "2011-07" or "20:11-07"?

Upvotes: 1

Views: 103

Answers (2)

Laurent H.
Laurent H.

Reputation: 6526

I suggest you to match the first intermediate character into a group, and use the result of this group to match the second character, as follows. You just have to retrieve the correct groups at the end:

import re

times = ['20-11-07', '2011-07', '20-1107', '201107', '20:11-07', '20-10:07', '20:11:07']

TIME_REGEX = r'([0-1][0-9]|2[0-3])([:\-\_]*)([0-5][0-9])(\2)([0-5][0-9])'

for time in times:
    m = re.search(TIME_REGEX, time)
    if m:
        print(time, "matches with following groups:", m.group(1), m.group(3), m.group(5))
    else:
        print(time, "does not match")

# 20-11-07 matches with following groups: 20 11 07
# 2011-07 does not match
# 20-1107 does not match
# 201107 matches with following groups: 20 11 07
# 20:11-07 does not match
# 20-10:07 does not match
# 20:11:07 matches with following groups: 20 11 07

Upvotes: 1

Dux
Dux

Reputation: 1246

You can store the delimiter in a group and reuse it:

TIME_REGEX = "([0-1][0-9]|2[0-3])(?P<sep>[:\-\_]?)([0-5][0-9])(?P=sep)([0-5][0-9])"

Here, (?P<sep>...) stores the content of this group under the name sep, which we ruse with (?P+<sep>). This way, both items always have to be equal.

Example:

for test in ['201107', '20-11-07', '20-11:07']:
    match = re.match(TIME_REGEX, test)
    if match:
        print test, match.group(1, 3, 4), "delimiter: '{}'".format(match.group('sep'))

yields:

201107 ('20', '11', '07') delimiter: ''
20-11-07 ('20', '11', '07') delimiter: '-'

Upvotes: 1

Related Questions