john johnson
john johnson

Reputation: 730

converting written date to date format in python

I am using Python 2.7.

I have an Adobe PDF form doc that has a date field. I extract the values using the pdfminer function. The problem I need to solve is, the user in Adobe Acrobat reader is allowed to type in strings like april 3rd 2017 or 3rd April 2017 or Apr 3rd 2017 or 04/04/2017 as well as 4 3 2017. Now the date field in Adobe is set to mm/dd/yyyy format, so when a user types in one of the values above, that is the actual value that pdfminer pulls, yet adobe will display it as 04/03/2017, but when you click on the field is shows you the actual value like the ones above. Adobe allows this and then doing it's on conversion I think to display the date as mm/dd/yyyy. There is ability to use javascript with adobe for more control, but i can't do that the users can only have and use the pdf form without any accompanying javascript file.

So I was looking to find a method with datetime in Python that would be able to accept a written date such as the examples above from a string and then convert them into a true mm/dd/yyyy format??? I saw methods for converting long and short month names but nothing that would handle day names like 1st,2nd,3rd,4th .

Upvotes: 1

Views: 5174

Answers (3)

Martin Evans
Martin Evans

Reputation: 46759

You could just try each possible format in turn. First remove any st nd rd specifiers to make the testing easier:

from datetime import datetime

formats = ["%B %d %Y", "%d %B %Y", "%b %d %Y", "%m/%d/%Y", "%m %d %Y"]
dates = ["april 3rd 2017", "3rd April 2017", "Apr 3rd 2017", "04/04/2017", "4 3 2017"]

for date in dates:
    date = date.lower().replace("rd", "").replace("nd", "").replace("st", "")

    for format in formats:
        try:
            print datetime.strptime(date, format).strftime("%m/%d/%Y")
        except ValueError:
            pass

Which would display:

04/03/2017
04/03/2017
04/03/2017
04/04/2017
04/03/2017

This approach has the benefit of validating each date. For example a month greater than 12. You could flag any dates that failed all allowed formats.

Upvotes: 2

Kruupös
Kruupös

Reputation: 5474

Based on @MartinEvans's anwser, but using arrow library: (because it handles more cases than datetime so you don't have to use replace() nor lower())

First install arrow:

pip install arrow

Then try each possible format:

import arrow

dates = ['april 3rd 2017', '3rd April 2017', 'Apr 3rd 2017', '04/04/2017', '4 3 2017']
formats = ['MMMM Do YYYY', 'Do MMMM YYYY', 'MMM Do YYYY', 'MM/DD/YYYY', 'M D YYYY']

def convert_datetime(date):
    for format in formats:
        try:
            print arrow.get(date, format).format('MM/DD/YYYY')
        except arrow.parser.ParserError:
            pass

[convert_datetime(date) for date in dates]

Will output:

04/03/2017
04/03/2017
04/03/2017
04/04/2017
04/03/2017

If you are unsure of what could be wrong in your date format, you can also output a nice error message if none of the date matches the format:

def convert_datetime(date):
    for format in formats:
        try:
            print arrow.get(date, format).format('MM/DD/YYYY')
            break
        except (arrow.parser.ParserError, ValueError) as e:
            pass
    else:
        print 'For date: "{0}", {1}'.format(date, e)

convert_datetime('124 5 2017') # test invalid date

Will output the following error message:

'For date: "124 5 2017", month must be in 1..12'

Upvotes: 0

dev93
dev93

Reputation: 337

Just write a regular expression to get the number out of the string.

import re

s = '30Apr' 
n = s[:re.match(r'[0-9]+', s).span()[1]]
print(n) # Will print 30

The other things should be easy.

Upvotes: 1

Related Questions