AlexW
AlexW

Reputation: 2589

Python - recognize a date time string and make sure its in a datetime readable format

I have a script that reads emails and pulls out date times from the body. which was working fine until it received an email with the datetime format as per below:

06:00 Wednesday 22ndFebruary 2017

There was no space between the 22nd and February, hence when the function run to get the times out of the body i got the error

Traceback (most recent call last):
  File "email_processing.py", line 137, in <module>
    e_start_time, e_end_time = main_dt(content)
  File "email_processing.py", line 26, in main_dt
    date = dateutil.parser.parse(re.search(pattern, data).group(0))
  File "/usr/lib/python2.7/site-packages/dateutil/parser.py", line 1168, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/python2.7/site-packages/dateutil/parser.py", line 559, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

My current function is as per the below, can anyone think of a way of validating it/making sure there are spaces where there needs to be? given that there could be a space issue anywhere in the string and that the dates will obviously change as more emails come in?

Thanks

def main_dt(data):
    dates = []
    for pattern in ['(?<=Start Time & Date: ).*', '(?<=Completion Time & Date: ).*']:
        try:
            date = dateutil.parser.parse(re.search(pattern, data).group(0))
        except:
            print re.search(pattern, data).group(0)
        dates.append(date)
    return dates

sample body:

Dear Customer,

(Call Transferred) We are writing to inform you of planned engineering work taking place which could impact your service.

The affected site is :  XXXXXX

Maintenance window:

Start Time & Date: 01:00 Wednesday 22nd February 2017               
Completion Time & Date: 06:00 Wednesday 22ndFebruary 2017                

Details of Work:
...

Upvotes: 1

Views: 287

Answers (1)

apurva.nandan
apurva.nandan

Reputation: 1101

This is more of a regular expression problem.

The part you're catching the exception, make sure you reformat the incorrect data using:

    validation_pattern = '(.*\\d+[a-z]{2})([A-Z].*)'
    try:
        date = dateutil.parser.parse(re.search(pattern, data).group(0))
    except:
        dirty_data_group = re.search(pattern, data).group(0)
        tidy_data_group = re.sub(validation_pattern, r'\1 \2', dirty_data_group)
        date = dateutil.parser.parse(tidy_data_group)

This should get the right date every time in the array. This is however limited to the specific problem which you are referring to i.e. No space between the day of the month and the month itself.

Upvotes: 1

Related Questions