Johann Lau
Johann Lau

Reputation: 185

Python Human-friendly string to datetime

I am working on a program which gets a human-friendly input and converts it into a unix time (i.e. seconds since 1970 1st January midnight).

Basically, the user would input some or all of the units in arbitrary order.

I've looked for similar questions and found some external sites to convert units into timedelta. However, none suggested a way to convert it into a datetime object. strptime is too strict and doesn't (seem to) allow for different orders

It doesn't matter whether it converts into a datetime.datetime object and then the unix time, or directly into unix. I might not need the exact code, but I would be glad to be pointed at the right direction.

Upvotes: 1

Views: 451

Answers (3)

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10709

Define the symbols and their meaning e.g. y for year. Then using regex, parse the string to see each unit e.g. [('2030', 'y'), ('1', 'd'), ('4', 'M')]. Using those 2 data, we can already construct a datetime object.

from datetime import datetime, MINYEAR
import re

UNITS = {
    "y": "year",
    "M": "month",
    "d": "day",
    "h": "hour",
    "m": "minute",
}

dt_re = re.compile(r"(\d+)([A-Za-z])")

for text_date in [
    "2030y1d4M",
    "2030y1d4M5m",
    "6h1d4M",
    "23h1993y59m25d12M",
]:
    unit_list = dt_re.findall(text_date)
    dt_kwargs = {"year": MINYEAR, "month": 1, "day": 1}  # Default values for required arguments

    for unit in unit_list:
        dt_kwargs[UNITS[unit[1]]] = int(unit[0])

    dt = datetime(**dt_kwargs)
    print(dt)

Output

2030-04-01 00:00:00
2030-04-01 00:05:00
0001-04-01 06:00:00
1993-12-25 23:59:00

Upvotes: 1

Daweo
Daweo

Reputation: 36550

What you would surely need is to tokenize your inputs. Considering your example inputs it should be tokenizable via regular expressions, consider following example

import re
def tokenize(x):
   return re.findall(r'(\d+)(\D+)',x)
d1 = "2030y1d4M"
d2 = "2030y1d4M5m"
d3 = "6h1d4M"
print(tokenize(d1))
print(tokenize(d2))
print(tokenize(d3))

output

[('2030', 'y'), ('1', 'd'), ('4', 'M')]
[('2030', 'y'), ('1', 'd'), ('4', 'M'), ('5', 'm')]
[('6', 'h'), ('1', 'd'), ('4', 'M')]

Explanation: function tokenize does convert input string into list of 2-tuples containg value (as string) and unit (also string). Beware however that this assume that user input are items of certain Chomsky Type 3 languge, if this does not hold true, regular expression will not suffice.

Upvotes: 1

Maurice Meyer
Maurice Meyer

Reputation: 18106

You could use a regex to parse the user input:

import re
import datetime as dt
userDate = '6h2030y1d4M5m'

# Use reg ex to parse user input
dateParts = {
    m[-1]: int(m[:-1])
    for m in re.findall(r'([\d]{1,4}[ydhmM]{1})', userDate)
}
now = dt.datetime.now()

# construct datetime obj, use datetime.now() as default for the case values are missing in user string
dObj = dt.datetime(
    dateParts.get('y', now.year), dateParts.get('m', now.month),
    dateParts.get('d', now.day), dateParts.get('h', now.hour),
    dateParts.get('M', now.minute), dateParts.get('s', now.second)
)

print(dObj)

Out:

2030-05-01 06:04:35

Upvotes: 1

Related Questions