Bijan
Bijan

Reputation: 8594

Python: Parse String as Date with Formatting

A user can input a string and the string contains a date in the following formats MM/DD/YY or MM/DD/YYYY. Is there an efficient way to pull the date from the string? I was thinking of using RegEx for \d+\/\d+\/\d+. I also want the ability to be able to sort the dates. I.e. if the strings contain 8/17/15 and 08/16/2015, it would list the 8/16 date first and then 8/17

Upvotes: 1

Views: 2573

Answers (6)

plamut
plamut

Reputation: 3206

Have a look at datetime.strptime, it's a built in function that knows how to create a datetime object from a string. It accepts a string to be converted and the format the date is written in.

from datetime import datetime

def str_to_date(string):
    pattern = '%m/%d/%Y' if len(string) > 8 else '%m/%d/%y'
    try:
        return datetime.strptime(string, pattern).date()
    except ValueError:
        raise  # TODO: handle invalid input

The function returns a date() object which can be directly compared with other date() objects (e.g. when sorting) them.

Usage:

>>> d1 = str_to_date('08/13/2015')
>>> d2 = str_to_date('08/12/15')
>>> d1
datetime.date(2015, 8, 13)
>>> d2
datetime.date(2015, 8, 12)
>>> d1 > d2
True

Update

OP explained in a comment that strings such as 'foo 08/13/2015 bar' should not be automatically thrown away, and that the date should be extracted from them.

To achieve that, we must first search for a candidate string in user's input:

import re
from datetime import date

user_string = input('Enter something')  # use raw_input() in Python 2.x

pattern = re.compile(r'(\d{2})/(\d{2})/(\d{4}|\d{2})')  # 4 digits match first!
match = re.search(pattern, user_string)

if not match:
    d = None
else:
    month, day, year = map(int, match.groups())
    try:
        d = date(year, month, day)
    except ValueError:
        d = None  # or handle error in a different way

print(d)

The code reads user input and then tries to find a pattern in it that represents a date in MM/DD/YYYY or MM/DD/YY format. Note that the last capturing group (in parentheses, i.e. ()) checks for either four or two consecutive digits.

If it finds a candidate date, it unpacks the capturing groups in the match, converting them to integers at the same time. It then uses the three matched pieces to tries to create a new date() object. If that fails, the candidate date was invalid, e.g. '02/31/2015'

Footnotes:

  • the code will only catch the first date candidate in the input
  • the regular expression used will, in its current form, also match dates in inputs like '12308/13/2015123'. If this is not desirable it would have to be modified, probably adding some lookahead/lookbehind assertions.

Upvotes: 3

hiro protagonist
hiro protagonist

Reputation: 46849

you could also try strptime:

import time

dates = ('08/17/15', '8/16/2015')

for date in dates:
    print(date)
    ret = None
    try:
        ret = time.strptime(date, "%m/%d/%Y")
    except ValueError:
        ret = time.strptime(date, "%m/%d/%y")
    print(ret)

UPDATE

update after comments:

this way you will get a valid date back or None if the date can not be parsed:

import time

dates = ('08/17/15', '8/16/2015', '02/31/15')

for date in dates:
    print(date)
    ret = None
    try:
        ret = time.strptime(date, "%m/%d/%Y")
    except ValueError:
        try:
            ret = time.strptime(date, "%m/%d/%y")
        except ValueError:
            pass
    print(ret)

UPDATE 2

one more update after the comments about the requirements.

this is a version (it only takes care of the dates; not the text before/after. but using the regex group this can easily be extracted):

import re
import time

dates = ('foo 1 08/17/15', '8/16/2015 bar 2', 'foo 3 02/31/15 bar 4')

for date in dates:
    print(date)
    match = re.search('(?P<date>[0-9]+/[0-9]+/[0-9]+)', date)
    date_str = match.group('date')
    ret = None
    try:
        ret = time.strptime(date_str, "%m/%d/%Y")
    except ValueError:
        try:
            ret = time.strptime(date_str, "%m/%d/%y")
        except ValueError:
            pass
    print(ret)

Upvotes: 3

amza
amza

Reputation: 810

Why not use strptime to store them as datetime objects. These objects can easily be compared and sorted that way.

import datetime
try:
    date = datetime.datetime.strptime("08/03/2015", "%m/%d/%Y")
except:
    date = datetime.datetime.strptime("08/04/15", "%m/%d/%y")
finally:
    dateList.append(date)

Note the difference between %Y and %y. You can then just compare dates made this way to see which ones are greater or less. You can also sort it using dateList.sort()

If you want the date as a string again you can use:

>>> dateString = date.strftime("%Y-%m-%d")
>>> print dateString
'2015-08-03'

Upvotes: 1

Dan
Dan

Reputation: 10171

Using regex groups we'd get something like this:

import re
ddate = '08/16/2015'

reg = re.compile('(\d+)\/(\d+)\/(\d+)')
matching = reg.match(ddate)
if matching is not None:
    print(matching.groups())

Would yield

('08','16','2015')

You could parse this after, but if you wanted to get rid of leading 0's from the first place you could use

reg = re.compile('0*(\d+)\/0*(\d+)\/(\d+)')

Upvotes: 0

Alexander
Alexander

Reputation: 109526

You can use the date parser from Pandas.

import pandas as pd

timestr = ['8/8/95', '8/15/2014']
>>> [pd.datetools.parse(d) for d in timestr]
[datetime.datetime(1995, 8, 8, 0, 0), datetime.datetime(2014, 8, 15, 0, 0)]

Upvotes: 0

DeepSpace
DeepSpace

Reputation: 81594

Why bother with regex when you can use datetime.strptime?

Upvotes: 0

Related Questions