rex sphinx
rex sphinx

Reputation: 31

Extracting time with regex from a string

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:

Mon - Fri:,10:00 - 19:00

Any help would be appreciated in this regard. I have tried the following:

import re

txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)

But this regex and any other that I tried to use didn't do the task.

Upvotes: 2

Views: 2355

Answers (4)

ggorlen
ggorlen

Reputation: 56865

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.

After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).

import re
from datetime import datetime

def to_military_time(x):
    return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")

txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00

Upvotes: 3

Ionut Ticus
Ionut Ticus

Reputation: 2789

Why not use the time module?

import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)

Output:

Mon - Fri:,10:00 - 19:00

Upvotes: 1

Chih Sean Hsu
Chih Sean Hsu

Reputation: 433

Regex need to change like here.

import re

text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm

You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.

You can learn more regex here.

https://regexr.com/

And here you can try regex online.

https://regex101.com/

Upvotes: 1

tmrlvi
tmrlvi

Reputation: 2361

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.

data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

However, you might want to consider all punctuation as valid:

data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

Upvotes: 1

Related Questions