Reputation: 363
I am looking to match following time formats using regular expressions in python and mark True or False whenever the match is found/ not found in a line. Sample text as below. How can I achieve this task using only regular expressions?
One can observe '_am - _pm' and '_am-_pm' patterns that are consistent in every notation.The colons and number format with spaces matching is what I have been attempting to do. below is what I found from here
HH:MM 12-hour format, optional leading 0, mandatory meridiems (AM/PM)
/((1[0-2]|0?[1-9]):([0-5][0-9]) ?([AaPp][Mm]))/
Sample text:
Lorem Ipsum is dummy text of the printing and typesetting industry between 2am-8pm.
Contrary to popular belief, Lorem Ipsum is not simply random text. : False
Lorem has been the industry between 2:00am - 8:00pm standard dummy text since the 1500s.
It has survived not only five centuries, but also between 08:00am-05:00pm
It was popularised from 5:30am - 8:59pm with the release of Letraset sheets.
More recently with desktop publishing software like Aldus PageMaker 983-765-0976.
Desired output:
Lorem Ipsum is dummy text of the printing and typesetting industry between 2am-8pm. : True
Contrary to popular belief, Lorem Ipsum is not simply random text. : False
Lorem has been the industry between 2:00am - 8:00pm standard dummy text since the 1500s. : True
It has survived not only five centuries, but also between 08:00am-05:00pm : True
It was popularised from 5:30am - 8:59pm with the release of Letraset sheets. : True
More recently with desktop publishing software like Aldus PageMaker 983-765-0976. : False
Upvotes: 2
Views: 642
Reputation: 626699
You may use
(?i)(?<!\d)(?:1[0-2]|0?[1-9])(?::(?:[0-5][0-9]))?\s?[ap]m\s*-\s*(?:1[0-2]|0?[1-9])(?::(?:[0-5][0-9]))?\s?[ap]m\b
See the regex demo
Details
(?i)
- case insensitive mode on(?<!\d)
- no digit before is allowed(?:1[0-2]|0?[1-9])(?::(?:[0-5][0-9]))?
- time pattern:
(?:1[0-2]|0?[1-9])
- 0
to 12
with an optional leading 0
before 1-9
digits(?::(?:[0-5][0-9]))?
- an optional minut sequence with :
separator\s?
- an optional whitespace[ap]m
- a
or p
and then m
\s*-\s*
- a hyphen enclosed with 0+ whitespaces(?:1[0-2]|0?[1-9])(?::(?:[0-5][0-9]))?\s?[ap]m
- the same time pattern as above\b
- word boundary.import re
time = r'(?:1[0-2]|0?[1-9])(?::(?:[0-5][0-9]))?\s?[ap]m'
pattern = re.compile(r'(?i)(?<!\d){0}\s*-\s*{0}\b'.format(time))
texts = ['Lorem Ipsum is dummy text of the printing and typesetting industry between 2am-8pm.',
'Contrary to popular belief, Lorem Ipsum is not simply random text.',
'Lorem has been the industry between 2:00am - 8:00pm standard dummy text since the 1500s.',
'It has survived not only five centuries, but also between 08:00am-05:00pm',
'It was popularised from 5:30am - 8:59pm with the release of Letraset sheets.',
'More recently with desktop publishing software like Aldus PageMaker 983-765-0976.']
for text in texts:
print (text, bool(pattern.search(text)), sep=" : ")
Output:
Lorem Ipsum is dummy text of the printing and typesetting industry between 2am-8pm. : True
Contrary to popular belief, Lorem Ipsum is not simply random text. : False
Lorem has been the industry between 2:00am - 8:00pm standard dummy text since the 1500s. : True
It has survived not only five centuries, but also between 08:00am-05:00pm : True
It was popularised from 5:30am - 8:59pm with the release of Letraset sheets. : True
More recently with desktop publishing software like Aldus PageMaker 983-765-0976. : False
Upvotes: 2