Reputation: 15629
I'm using Python 3.6.
I'm having an issue with reformatting dates. My code is currently processing 610 dates, but the code throws a Value Error on dates containing the month of August.
ERROR:time data 'Augu 30, 2017' does not match format '%B %d, %Y'
Here's the HTML string that my code is trying to reformat.
<td>
<div class="date">
<span data-date-format="MMMM Do, YYYY" data-date-value="2017-08-30T16:04:39.3+00:00" data-hook="datetime">August 30th, 2017</span>
</div>
</td>
The date in this string is August 30th, 2017, so what is causing the Value Error?
Here's my code:
publishedDateFormat = table.find('div', {'class': 'date'})
for date in publishedDateFormat.find('span'):
cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')
locale.setlocale(locale.LC_ALL, 'en_US')
publishedDate = datetime.datetime.strptime(cleanDate, '%B %d, %Y').strftime('%m%d%Y')
list_of_cells.append(publishedDate)
Upvotes: 0
Views: 1111
Reputation: 1712
The line that is causing the issue is:
cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')
You are getting rid of the "st"
from "August"
.
I would recommend using regex (or some other means) to check if the character immediately preceding it is a digit ([0-9]
).
Example of regex:
cleandate = re.sub('([0-9])(nd|rd|st|th)' , '\g<1>', date)
Upvotes: 1
Reputation: 2634
In your code
cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')
replace('st', '') is changing August to Augu, which is causing the error.
Kindly correct your formating.
Use regex to collect the date fields and then create a cleanDate object as following:-
import re, locale, datetime
# considering dateString is the string representation of Date from Text
dateString = 'August 30th, 2017'
dateValues = re.search(r'(\w+)[\s](\d+)[A-Za-z\s,]*(\d+)', dateString)
if dateValues:
cleanDate = dateValues.groups(0)[0]+' '+dateValues.groups(0)[1]+', '+dateValues.groups(0)[2]
locale.setlocale(locale.LC_ALL, 'en_US')
publishedDate = datetime.datetime.strptime(cleanDate, '%B %d, %Y').strftime('%m%d%Y')
print publishedDate
Upvotes: 2