Reputation: 31
I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm
. So i need to extract the times 10:00 am
and 7:00 pm
and then convert them to 24 hour format. Then the final string I want to make is like this:
Mon - Fri:,10:00 - 19:00
Any help would be appreciated in this regard. I have tried the following:
import re
txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)
But this regex and any other that I tried to use didn't do the task.
Upvotes: 2
Views: 2355
Reputation: 56865
Your regex enforces a whitespace before the leading digit which prevents ,10:00 am
from matching and requires two digits before the colon which fails to match 7:00 pm
. r"(?i)(\d?\d:\d\d (?:a|p)m)"
seems like the most precise option.
After that, parse the match using datetime.strptime
and convert it to military using the "%H:%M"
format string. Any invalid times like 10:67
will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).
import re
from datetime import datetime
def to_military_time(x):
return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")
txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00
Upvotes: 3
Reputation: 2789
Why not use the time module?
import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)
Output:
Mon - Fri:,10:00 - 19:00
Upvotes: 1
Reputation: 433
Regex need to change like here.
import re
text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm
You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.
You can learn more regex here.
And here you can try regex online.
Upvotes: 1
Reputation: 2361
Your regex looks only for two digit hours (\d{2}
) with white space before them (\s
). The following captures also one digit hours, with a possible comma instead of the space.
data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
However, you might want to consider all punctuation as valid:
data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
Upvotes: 1