alukach
alukach

Reputation: 6288

Retrieving a date from a complex string in Python

I'm trying to get a single datetime out of two strings using datetime.strptime.

The time is pretty easy (ex. 8:53PM), so I can do something like:

theTime = datetime.strptime(givenTime, "%I:%M%p")

However, the string has more than just a date, it's a link in a format similar to http://site.com/?year=2011&month=10&day=5&hour=11. I know that I could do something like:

theDate = datetime.strptime(givenURL, "http://site.com/?year=%Y&month=%m&day=%d&hour=%H")

but I don't want to get that hour from the link since it's being retrieved elsewhere. Is there a way to put a dummy symbol (like %x or something) to serve as a flexible space for that last variable?

In the end, I envision having a single line similar to:

theDateTime = datetime.strptime(givenURL + givenTime, ""http://site.com/?year=%Y&month=%m&day=%d&hour=%x%I:%M%p")

(although, obviously, the %x wouldn't be used). Any ideas?

Upvotes: 3

Views: 377

Answers (3)

eyquem
eyquem

Reputation: 27575

import datetime
import re

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53PM'

print ' givenURL == ' + givenURL
print 'givenTime == ' + givenTime

regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d?')
print '\nmap(int,regx.search(givenURL).groups()) ==',map(int,regx.search(givenURL).groups())

theDate = datetime.date(*map(int,regx.search(givenURL).groups()))
theTime = datetime.datetime.strptime(givenTime, "%I:%M%p")

print '\ntheDate ==',theDate,type(theDate)
print '\ntheTime ==',theTime,type(theTime)


theDateTime = theTime.replace(theDate.year,theDate.month,theDate.day)
print '\ntheDateTime ==',theDateTime,type(theDateTime)

result

 givenURL == http://site.com/?year=2011&month=10&day=5&hour=11
givenTime == 08:53PM

map(int,regx.search(givenURL).groups()) == [2011, 10, 5]

theDate == 2011-10-05 <type 'datetime.date'>

theTime == 1900-01-01 20:53:00 <type 'datetime.datetime'>

theDateTime == 2011-10-05 20:53:00 <type 'datetime.datetime'>

Edit 1

As strptime() is slow, I improved my code to eliminate it

from datetime import datetime
import re
from time import clock


n = 10000

givenURL  = 'http://site.com/?year=2011&month=10&day=5&hour=11'
givenTime = '08:53AM'

# eyquem
regx = re.compile('year=(\d\d\d\d)&month=(\d\d?)&day=(\d\d?)&hour=\d\d? (\d\d?):(\d\d?)(PM|pm)?')
t0 = clock()
for i in xrange(n):
    given = givenURL + ' ' + givenTime
    mat = regx.search(given)
    grps = map(int,mat.group(1,2,3,4,5))
    if mat.group(6):
        grps[3] += 12 # when it is PM/pm, the hour must be augmented with 12
    theDateTime1 = datetime(*grps)
print clock()-t0,"seconds   eyquem's code"
print theDateTime1


print

# Artsiom Rudzenka
dateandtimePattern = "http://site.com/?year=%Y&month=%m&day=%d&time=%I:%M%p"
t0 = clock()
for i in xrange(n):
    theDateTime2 = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' + givenTime, dateandtimePattern)
print clock()-t0,"seconds   Artsiom's code"
print theDateTime2

print
print theDateTime1 == theDateTime2

result

0.460598763251 seconds   eyquem's code
2011-10-05 08:53:00

2.10386180366 seconds   Artsiom's code
2011-10-05 08:53:00

True

My code is 4.5 times faster. That may be interesting if there are a lot of such transformations to perform

Upvotes: 1

Artsiom Rudzenka
Artsiom Rudzenka

Reputation: 29093

Think that if you would like to simple skip time from the URL you can use split for example the following way:

givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
pattern = "http://site.com/?year=%Y&month=%m&day=%d"
theDate = datetime.strptime(givenURL.split('&hour=')[0], pattern)

So not sure that understood you correctly, but:

givenURL = 'http://site.com/?year=2011&month=10&day=5&hour=11'
datePattern = "http://site.com/?year=%Y&month=%m&day=%d"
timePattern = "&time=%I:%M%p"

theDateTime = datetime.strptime(givenURL.split('&hour=')[0] + '&time=' givenTime, datePattern + timePattern)

Upvotes: 2

Brent Newey
Brent Newey

Reputation: 4509

There's no way to do that with the format string. However, if the hour doesn't matter, you can get it from the URL as in your first example and then call theDateTime.replace(hour=hour_from_a_different_source).

That way you don't have to do any additional parsing.

Upvotes: 0

Related Questions