Reputation: 1253
I have strings that have dates in different formats. For example,
sample_str_1 = 'this amendment of lease, made and entered as of the 10th day of august, 2016, by and between john doe and jane smith'
Also, another string that has the date in it as,
sample_str_2 ='this agreement, made and entered as of May 1, 2016, between john doe and jane smith'
In order to extract just the date from the first string, I did something like this,
match = re.findall(r'\S+d{4}\s+', sample_str_1)
this gives an empty list.
For the second string, I used the same method as I used for first string and getting an empty string.
I also, tried datefinder
module and it gave me an output like this,
import datefinder
match = datefinder.find_dates(sample_str_1)
for m in match:
print(m)
>> 2016-08-01 00:00:00
Above output is wrong, which should be 2016-08-10 00:00:00
I tried another way using this older post
match = re.findall(r'\d{2}(?:january|february|march|april|may|june|july|august|september|october|november|december)\d{4}',sample_str_1)
This again gave me an empty list.
How can I extract dates like that from a string? Is there a generic method to extract dates that have text and digits? Any help would be appreciated.
Upvotes: 1
Views: 212
Reputation: 3405
Regex: (?:(\d{1,2})(?:th|nd|rd).* ([a-z]{3})[a-z]*|([a-z]{3})[a-z]* (\d{1,2})),\s*(\d{4})
Python code:
regex = re.compile('(?:(\d{1,2})(?:th|nd|rd).* ([a-z]{3})[a-z]*|([a-z]{3})[a-z]* (\d{1,2})),\s*(\d{4})', re.I)
for x in regex.findall(text):
if x[0] == '':
date = '-'.join(filter(None, x))
else:
date = '%s-%s-%s' % (x[1],x[0],x[4])
print(datetime.datetime.strptime(date, '%b-%d-%Y').date())
Output:
2016-08-10
2016-05-01
Upvotes: 1