Reputation: 8594
A user can input a string and the string contains a date in the following formats MM/DD/YY
or MM/DD/YYYY
. Is there an efficient way to pull the date from the string? I was thinking of using RegEx for \d+\/\d+\/\d+
. I also want the ability to be able to sort the dates. I.e. if the strings contain 8/17/15
and 08/16/2015
, it would list the 8/16 date first and then 8/17
Upvotes: 1
Views: 2573
Reputation: 3206
Have a look at datetime.strptime, it's a built in function that knows how to create a datetime object from a string. It accepts a string to be converted and the format the date is written in.
from datetime import datetime
def str_to_date(string):
pattern = '%m/%d/%Y' if len(string) > 8 else '%m/%d/%y'
try:
return datetime.strptime(string, pattern).date()
except ValueError:
raise # TODO: handle invalid input
The function returns a date()
object which can be directly compared with other date()
objects (e.g. when sorting) them.
Usage:
>>> d1 = str_to_date('08/13/2015')
>>> d2 = str_to_date('08/12/15')
>>> d1
datetime.date(2015, 8, 13)
>>> d2
datetime.date(2015, 8, 12)
>>> d1 > d2
True
OP explained in a comment that strings such as 'foo 08/13/2015 bar'
should not be automatically thrown away, and that the date should be extracted from them.
To achieve that, we must first search for a candidate string in user's input:
import re
from datetime import date
user_string = input('Enter something') # use raw_input() in Python 2.x
pattern = re.compile(r'(\d{2})/(\d{2})/(\d{4}|\d{2})') # 4 digits match first!
match = re.search(pattern, user_string)
if not match:
d = None
else:
month, day, year = map(int, match.groups())
try:
d = date(year, month, day)
except ValueError:
d = None # or handle error in a different way
print(d)
The code reads user input and then tries to find a pattern in it that represents a date in MM/DD/YYYY
or MM/DD/YY
format. Note that the last capturing group (in parentheses, i.e. ()
) checks for either four or two consecutive digits.
If it finds a candidate date, it unpacks the capturing groups in the match, converting them to integers at the same time. It then uses the three matched pieces to tries to create a new date()
object. If that fails, the candidate date was invalid, e.g. '02/31/2015'
Footnotes:
Upvotes: 3
Reputation: 46849
you could also try strptime:
import time
dates = ('08/17/15', '8/16/2015')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
ret = time.strptime(date, "%m/%d/%y")
print(ret)
UPDATE
update after comments:
this way you will get a valid date back or None
if the date can not be parsed:
import time
dates = ('08/17/15', '8/16/2015', '02/31/15')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date, "%m/%d/%y")
except ValueError:
pass
print(ret)
UPDATE 2
one more update after the comments about the requirements.
this is a version (it only takes care of the dates; not the text before/after. but using the regex group this can easily be extracted):
import re
import time
dates = ('foo 1 08/17/15', '8/16/2015 bar 2', 'foo 3 02/31/15 bar 4')
for date in dates:
print(date)
match = re.search('(?P<date>[0-9]+/[0-9]+/[0-9]+)', date)
date_str = match.group('date')
ret = None
try:
ret = time.strptime(date_str, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date_str, "%m/%d/%y")
except ValueError:
pass
print(ret)
Upvotes: 3
Reputation: 810
Why not use strptime
to store them as datetime
objects. These objects can easily be compared and sorted that way.
import datetime
try:
date = datetime.datetime.strptime("08/03/2015", "%m/%d/%Y")
except:
date = datetime.datetime.strptime("08/04/15", "%m/%d/%y")
finally:
dateList.append(date)
Note the difference between %Y
and %y
. You can then just compare dates made this way to see which ones are greater or less. You can also sort it using dateList.sort()
If you want the date as a string again you can use:
>>> dateString = date.strftime("%Y-%m-%d")
>>> print dateString
'2015-08-03'
Upvotes: 1
Reputation: 10171
Using regex groups we'd get something like this:
import re
ddate = '08/16/2015'
reg = re.compile('(\d+)\/(\d+)\/(\d+)')
matching = reg.match(ddate)
if matching is not None:
print(matching.groups())
Would yield
('08','16','2015')
You could parse this after, but if you wanted to get rid of leading 0's from the first place you could use
reg = re.compile('0*(\d+)\/0*(\d+)\/(\d+)')
Upvotes: 0