Reputation: 2609
I have some dates that contain multiple days that I am trying to parse. It seems that the datetime.strptime function does not support regular expressions and thus I cannot get it to ignore one day at a time. Is there an easy solution to this that I am missing?
Here are some examples:
March 20 & June 8, 2011
September 4 & 27, 2010
February 15, December 5 & 6, 2013
I know that each of these examples differ quite drastically, but I am hoping to get a solution for even one of them. An approach that can easily work across a wide range with some formatting parameter would be awesome.
Additionally, there may be cases where the date is formatted differently which I assume should be easier to handle:
7/2/2011 & 8/9/2011
Upvotes: 0
Views: 432
Reputation: 63739
Pyparsing is a handy Python module for parsing strings like this. Here is an annotated parser that cracks your input strings and gives months, days, and years for each:
import pyparsing as pp
import calendar
COMMA = pp.Suppress(',')
AMP = pp.Suppress('&')
DASH = pp.Suppress('-')
# use pyparsing-defined integer expression, which auto-converts parsed str's to int's
day_number = pp.pyparsing_common.integer()
# day numbers only go from 1-31
day_number.addCondition(lambda t: 1 <= t[0] <= 31)
# not in the spec, but let's support day ranges, too!
day_range = day_number("first") + DASH + day_number("last")
# parse-time conversion from "4-6" to [4, 5, 6]
day_range.addParseAction(lambda t: list(range(t.first, t.last+1)))
# this function will come in handy to build list parsers of day numbers and month-day
expr_list = lambda expr: expr + pp.ZeroOrMore(COMMA + expr) + pp.Optional(AMP + expr)
# support "10", "10 & 11", "10, 11, & 12"
day_list = expr_list(day_range | day_number)
# get the month names from the calendar module
month_name = pp.oneOf(calendar.month_name[1:])
# an expression containing a month name and a list of 1 or more day numbers
date_expr = pp.Group(month_name("month") + day_list("days"))
# use expr_list again to support multiple date_exprs separated by commas and ampersands
date_list = expr_list(date_expr)
year_number = pp.pyparsing_common.integer()
# year numbers start with 2000
year_number.addCondition(lambda t: t[0] >= 2000)
# put all together into a single parser expression
full_date = date_list("dates") + COMMA + year_number("year")
tests = """\
March 20 & June 8, 2011
September 4 & 27, 2010
February 15, December 5 & 6, 2013
September 4-6, 2010
"""
full_date.runTests(tests)
Prints:
March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
[0]:
['March', 20]
- days: [20]
- month: 'March'
[1]:
['June', 8]
- days: [8]
- month: 'June'
- year: 2011
September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
[0]:
['September', 4, 27]
- days: [4, 27]
- month: 'September'
- year: 2010
February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
[0]:
['February', 15]
- days: [15]
- month: 'February'
[1]:
['December', 5, 6]
- days: [5, 6]
- month: 'December'
- year: 2013
September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
[0]:
['September', 4, 5, 6]
- days: [4, 5, 6]
- month: 'September'
- year: 2010
To get (year, month, day) tuples, we add another parse action and rerun the tests:
print("convert parsed fields into (year, month-name, date) tuples")
def expand_dates(t):
return [(t.year, d.month, dy) for d in t.dates for dy in d.days]
full_date.addParseAction(expand_dates)
full_date.runTests(tests)
Prints:
convert parsed fields into (year, month-name, date) tuples
March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]
September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]
February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]
September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]
Finally, make them into datetime.date
objects with another parse action:
print("convert (year, month-name, date) tuples into datetime.date's")
# define mapping of month-name to month number 1-12
month_map = {name: num for num,name in enumerate(calendar.month_name[1:], start=1)}
from datetime import date
full_date.addParseAction(pp.tokenMap(lambda t: date(t[0], month_map[t[1]], t[2])))
full_date.runTests(tests)
Prints:
convert (year, month-name, date) tuples into datetime.date's
March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]
September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]
February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]
September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]
Upvotes: 0
Reputation: 2609
All of the above answers have been good and I figured out another method that allows for multiple years:
from datetime import datetime
import re
date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"
def extract_dates(date):
dates = []
last_index = None
for year in re.finditer('\d{4}', date):
if last_index is None:
text = date[:year.span(0)[0]]
else:
text = date[last_index:year.span(0)[0]]
last_index = year.span(0)[1]
months = [match for match in re.finditer('[A-z]+', text)]
for m, month in enumerate(months):
if m == len(months) - 1:
text_days = text[month.span(0)[1]:]
else:
text_days = text[month.span(0)[1]:months[m + 1].span(0)[0]]
for day in re.finditer('\d{1,2}', text_days):
dates.append(datetime.strptime(month.group(0) + ' ' + day.group(0) + ', ' + year.group(0), '%B %d, %Y'))
return dates
print(extract_dates(date1))
print(extract_dates(date2))
print(extract_dates(date3))
Upvotes: 0
Reputation: 82765
This is one approach using datetime
module
Demo:
import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"
def getDate(in_value):
result = []
in_value = in_value.split(",")
year = in_value.pop(-1)
for dateV in in_value:
if "&" in dateV:
temp = []
val = dateV.split()
month = val.pop(0)
for i in val:
if i.isdigit():
temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
result.append(" & ".join(temp))
else:
result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
return ", ".join(result)
print( getDate(d1) )
print( getDate(d2) )
Output:
03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013
Upvotes: 1
Reputation: 22503
Probably not the best way to do it, but this is my attempt:
import re
date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"
date_group = [date1,date2,date3]
for date in date_group:
result = re.findall(r"\d{4}|[A-Z][a-z]+ \d{1,2} & \d{1,2}|[A-Z][a-z]+ \d{1,2}", date)
year = result[-1]
for i in range(len(result)-1):
d = result[i].split(" ")
try:
d.remove("&")
except ValueError:
pass
finally:
for a in range(1,len(d)):
date = d[0]+'{:02d}'.format(int(d[a]))+year
time_date = datetime.strptime(date,"%B%d%Y")
print (time_date)
Result:
2011-03-20 00:00:00
2011-06-08 00:00:00
2010-09-04 00:00:00
2010-09-27 00:00:00
2013-02-15 00:00:00
2013-12-05 00:00:00
2013-12-06 00:00:00
Basically just extract the year first and then dates. Will not work if there are multiple years though.
Upvotes: 1
Reputation: 3156
I would start by splitting the date strings into valid dates:
import re
def split_date(d):
return re.split(‘[,|&]’, d)
Upvotes: 0