Reputation: 73
I am attempting to create a DateTime object from a log file example string.
I have been trying to use a regex to parse this but it fails whenever I get to the format of the logfile which has a concatenated letter 'T' for the second half of the time. My test string is 'ERROR 2019-02-03T23:21:20 cannot find file'
def convert_to_datetime(line):
match = re.search('\d{4}-\d{2}-\d{2}', line)
I am struggling to get the full date out of the string. I have tried several regex but I think that I am using the wrong syntax.
Upvotes: 2
Views: 4737
Reputation: 705
Depending on what format you want the final string, here are 2 ways you can do this:
import re
def convert_to_datetime(line: str):
match = re.search('\d{4}-\d{2}-\d{2}', line.strip('T')).group()
match += ' | ' + re.search('\d{2}:\d{2}:\d{2}', line).group()
return match
def cut_out_datetime(line: str):
line = re.sub('ERROR ', "", line)
line = re.sub('T', " | ", line)
return line
s = 'ERROR 2019-02-03T23:21:20'
print(' Test string: ', s)
print()
print('Extract method: ', convert_to_datetime(s))
print(' "Trim" method: ', cut_out_datetime(s))
# OUTPUT:
Test string: ERROR 2019-02-03T23:21:20
Extract method: 2019-02-03 | 23:21:20
"Trim" method: 2019-02-03 | 23:21:20
[Done] exited with code=0 in 0.05 seconds
There are other ways with positions and slicing, but this is most similar to your original code. Replace the | as you see fit or break the time and date into 2 separate strings ...
Upvotes: 0
Reputation: 20490
You need to print the groups you matched too.
import re
s = 'ERROR 2019-02-03T23:21:20 cannot find file'
match = re.search('\d{4}-\d{2}-\d{2}', s)
print(match.group(0))
#2019-02-03
Also if you want to get the whole datetime string, you can do
import re
s = 'ERROR 2019-02-03T23:21:20 cannot find file'
match = re.search('\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}', s)
print(match.group(0))
#2019-02-03T23:21:20
After this if you want to get the datetime object you can use the https://pypi.org/project/python-dateutil/ library
from dateutil import parser
import re
s = 'ERROR 2019-02-03T23:21:20 cannot find file'
match = re.search('\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}', s)
#Datetime string
dt = match.group(0)
#Datetime object
dt_obj = parser.parse(dt)
print(dt_obj)
#2019-02-03 23:21:20
print(type(dt_obj))
#<class 'datetime.datetime'>
Or the best solution, use the parser
function defined above with fuzzy=True
from dateutil import parser
s = 'ERROR 2019-02-03T23:21:20 cannot find file'
print(parser.parse(s, fuzzy=True))
#2019-02-03 23:21:20
Upvotes: 1
Reputation: 7211
Not sure if you want this, but generating a datetime object from a string can be very complicated if your string is kind of free style. But we have dateutil package to help:
>>> import dateutil.parser
>>> s = 'ERROR 2019-02-03T23:21:20 cannot find file'
>>> dateutil.parser.parse(s, fuzzy=True)
datetime.datetime(2019, 2, 3, 23, 21, 20)
So if you like it, this is the function:
def convert_to_datetime(s):
return dateutil.parser.parse(s, fuzzy=True)
Upvotes: 3
Reputation: 521
First, after reading https://docs.python.org/3/library/re.html be careful than in Python 3 \d
is not exactly equivalent to [0-9]
,
Then,
be careful if there is no match pattern.match
will raise an error
try something like
pattern = re.compile('[0-9]{4}-[0-9]{2}-[0-9]{2}')
if pattern.search(line):
matches.append(pattern.search(line))
...
Upvotes: 0
Reputation: 7889
Your close. You just need to get the result:
def convert_to_datetime(line):
match = re.search('\d{4}-\d{2}-\d{2}', line)
return match.group() if match else "No match"
Test:
t = convert_to_datetime('ERROR 2019-02-03T23:21:20 cannot find file')
print(t)
Output:
2019-02-03
Upvotes: 0