Reputation: 2589
I have a script that reads emails and pulls out date times from the body. which was working fine until it received an email with the datetime format as per below:
06:00 Wednesday 22ndFebruary 2017
There was no space between the 22nd and February, hence when the function run to get the times out of the body i got the error
Traceback (most recent call last):
File "email_processing.py", line 137, in <module>
e_start_time, e_end_time = main_dt(content)
File "email_processing.py", line 26, in main_dt
date = dateutil.parser.parse(re.search(pattern, data).group(0))
File "/usr/lib/python2.7/site-packages/dateutil/parser.py", line 1168, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/lib/python2.7/site-packages/dateutil/parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format
My current function is as per the below, can anyone think of a way of validating it/making sure there are spaces where there needs to be? given that there could be a space issue anywhere in the string and that the dates will obviously change as more emails come in?
Thanks
def main_dt(data):
dates = []
for pattern in ['(?<=Start Time & Date: ).*', '(?<=Completion Time & Date: ).*']:
try:
date = dateutil.parser.parse(re.search(pattern, data).group(0))
except:
print re.search(pattern, data).group(0)
dates.append(date)
return dates
sample body:
Dear Customer,
(Call Transferred) We are writing to inform you of planned engineering work taking place which could impact your service.
The affected site is : XXXXXX
Maintenance window:
Start Time & Date: 01:00 Wednesday 22nd February 2017
Completion Time & Date: 06:00 Wednesday 22ndFebruary 2017
Details of Work:
...
Upvotes: 1
Views: 287
Reputation: 1101
This is more of a regular expression problem.
The part you're catching the exception, make sure you reformat the incorrect data using:
validation_pattern = '(.*\\d+[a-z]{2})([A-Z].*)'
try:
date = dateutil.parser.parse(re.search(pattern, data).group(0))
except:
dirty_data_group = re.search(pattern, data).group(0)
tidy_data_group = re.sub(validation_pattern, r'\1 \2', dirty_data_group)
date = dateutil.parser.parse(tidy_data_group)
This should get the right date every time in the array. This is however limited to the specific problem which you are referring to i.e. No space between the day of the month and the month itself.
Upvotes: 1