Life is complex
Life is complex

Reputation: 15629

ValueError: time data does not match format

I'm using Python 3.6.

I'm having an issue with reformatting dates. My code is currently processing 610 dates, but the code throws a Value Error on dates containing the month of August.

ERROR:time data 'Augu 30, 2017' does not match format '%B %d, %Y'

Here's the HTML string that my code is trying to reformat.

 <td>
   <div class="date">
     <span data-date-format="MMMM Do, YYYY" data-date-value="2017-08-30T16:04:39.3+00:00" data-hook="datetime">August 30th, 2017</span>
   </div>
 </td>

The date in this string is August 30th, 2017, so what is causing the Value Error?

Here's my code:

publishedDateFormat = table.find('div', {'class': 'date'})
for date in publishedDateFormat.find('span'):
   cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')
   locale.setlocale(locale.LC_ALL, 'en_US')
   publishedDate = datetime.datetime.strptime(cleanDate, '%B %d, %Y').strftime('%m%d%Y')
   list_of_cells.append(publishedDate)

Upvotes: 0

Views: 1111

Answers (2)

Ollie
Ollie

Reputation: 1712

The line that is causing the issue is:

cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')

You are getting rid of the "st" from "August".

I would recommend using regex (or some other means) to check if the character immediately preceding it is a digit ([0-9]).

Example of regex:

cleandate = re.sub('([0-9])(nd|rd|st|th)' , '\g<1>', date)

Upvotes: 1

nandal
nandal

Reputation: 2634

In your code

cleanDate = date.replace('nd', '').replace('rd', '').replace('st', '').replace('th', '')

replace('st', '') is changing August to Augu, which is causing the error.

Kindly correct your formating.

Use regex to collect the date fields and then create a cleanDate object as following:-

import re, locale, datetime

# considering dateString is the string representation of Date from Text
dateString = 'August 30th, 2017'
dateValues = re.search(r'(\w+)[\s](\d+)[A-Za-z\s,]*(\d+)', dateString)
if dateValues:
    cleanDate = dateValues.groups(0)[0]+' '+dateValues.groups(0)[1]+', '+dateValues.groups(0)[2]
    locale.setlocale(locale.LC_ALL, 'en_US')
    publishedDate = datetime.datetime.strptime(cleanDate, '%B %d, %Y').strftime('%m%d%Y')
    print publishedDate

Upvotes: 2

Related Questions