Reputation: 751
I'm a python newbie. My script (below) contains a function named "fn_regex_raw_date_string" that is intended to convert a "raw" date string like this: Mon, Oct 31, 2011 at 8:15 PM into a date string like this: _2011-Oct-31_PM_8-15_
Question No. 1: When the "raw" date string contains extraneous characters eg (xxxxxMon, Oct 31, 2011 at 8:15 PMyyyyyy), how should I modify my regular expression routine to exclude the extraneous characters?
I was tempted to remove my comments from the script below to make it
simpler to read, but I thought it might be more helpful for me to leave
them in the script.
Question No. 2: I suspect that I should code another function that will replace the "Oct" in "2011-Oct-31_PM_8-15_ " with "11". But I can't help wondering if there is some way to include that functionality in my fn_regex_raw_date_string function.
Any help would be much appreciated.
Thank you, Marceepoo
import sys
import re, pdb
#pdb.set_trace()
def fn_get_datestring_sysarg():
this_scriptz_FULLName = sys.argv[0]
try:
date_string_raw = sys.argv[1]
#except Exception, e:
except Exception:
date_string_raw_error = this_scriptz_FULLName + ': sys.argv[1] error: No command line argument supplied'
print date_string_raw_error
#returnval = this_scriptz_FULLName + '\n' + date_string_raw
returnval = date_string_raw
return returnval
def fn_regex_raw_date_string(date_string_raw):
# Do re replacements
# p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\FixCodeFromLegislaturezCalifCode_MikezCode.py
# see also (fnmatch) p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\bookmarkPDFs.aab.py
#srchstring = r"(.?+)(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(, )(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)( )([\d]{1,2})(, )([\d]{4})( at )([\d]{1,2})(\:)([\d]{1,2})( )(A|P)(M)(.?+)"
srchstring = r"(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(, )(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)( )([\d]{1,2})(, )([\d]{4})( at )([\d]{1,2})(\:)([\d]{1,2})( )(A|P)(M)"
srchstring = re.compile(srchstring)
replacement = r"_\7-\3-\5_\13M_\9-\11_"
#replacement = r"_\8-\4-\6_\14M_\10-\12_"
regex_raw_date_string = srchstring.sub(replacement, date_string_raw)
return regex_raw_date_string
# Mon, Oct 31, 2011 at 8:15 PM
if __name__ == '__main__':
try:
this_scriptz_FULLName = sys.argv[0]
date_string_raw = fn_get_datestring_sysarg()
date_string_mbh = fn_regex_raw_date_string(date_string_raw)
print date_string_mbh
except:
print 'error occurred - fn_get_datestring_sysarg()'
Upvotes: 0
Views: 218
Reputation: 1065
This code uses a regular expression that replaces everything at the start of a string before an abbreviated weekday is matched, and then everything to the end of the string after matching either AM or PM.
Then it calls datetime.strptime(date_str, date_format)
which does the hard work of parsing and gives us a datetime
instance:
from datetime import datetime
import calendar
import re
# -------------------------------------
# _months = "|".join(calendar.month_abbr[1:])
_weekdays = "|".join(calendar.day_abbr)
_clean_regex = re.compile(r"""
^
.*?
(?=""" + _weekdays + """)
|
(?<=AM|PM)
.*?
$
""", re.X)
# -------------------------------------
def parseRawDateString(raw_date_str):
try:
date_str = _clean_regex.sub("", raw_date_str)
return datetime.strptime(date_str, "%a, %b %d, %Y at %I:%M %p")
except ValueError as ex:
print("Error parsing date from '{}'!".format(raw_date_str))
raise ex
# -------------------------------------
if __name__ == "__main__":
from sys import argv
s = argv[1] if len(argv) > 1 else "xxxxxMon, Oct 31, 2011 at 8:15 PMyyyyyy"
print("Raw date: '{}'".format(s))
d = parseRawDateString(s)
print("datetime object:")
print(d)
print("Formatted date: '{}'".format(d.strftime("%A, %d %B %Y @ %I:%M %p")))
Upvotes: 0
Reputation: 465
You probably want to use python's standard datetime stuff:
http://docs.python.org/library/time.html#time.strptime
http://mail.python.org/pipermail/tutor/2006-March/045729.html
Upvotes: 1