Reputation: 815
I have been trying to follow the guidelines in this post with some tweaks: The best way to filter a log by a dates range in python
My log file is like:
2020-Oct-12 13:38:57.742759 -0700 : (some text)
(line of text)
(line of text)
2020-Oct-12 13:38:57.742760 -0700 : (some text)
...
2020-Oct-12 13:57:57.742759 -0700 : (some text)
I tried these two code snippets but they don't give anything. Is there something wrong with the date-time definition?
myfile = open('DIR_extract.log', 'w')
with open('DIRDriver.log','r') as f:
for line in f:
d = line.split(" ",1)[0]
if d >= '2020-10-12 13:38:57' and d <= '2020-10-12 13:57:57':
myfile.write("%s\n" % line)
and also
myfile = open('DIR_extract.log', 'w')
from itertools import dropwhile, takewhile
from_dt, to_td = '2020-10-12 13:38:57', '2020-10-12 13:57:57'
with open('DIRDriver.log') as fin:
of_interest = takewhile(lambda L: L <= to_td, dropwhile(lambda L: L < from_dt, fin))
for line in of_interest:
myfile.write("%s\n" % line)
Upvotes: 0
Views: 1059
Reputation: 2419
You almost got there.
d = line.split(" ",1)[0]
only return the first part of the datetime, eg: 2020-Oct-12
.
That is because your datetime
format is different than the answer that you linked to. You have space between date
and time
.
So to make it work, you need to grasp all date and time part of the line.
dt_start = '2020-Oct-12 13:38:57'
dt_end = '2020-Oct-12 13:57:57'
str_time_len = len(dt_start)
with open('DIR_extract.log', 'w+') as myfile:
with open('DIRDriver.log','r') as f:
for line in f:
date_time = line[:str_time_len]
if dt_start <= date_time <= dt_end:
myfile.write(line)
Assumed that the log file content is
2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside
The code above gives
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
Note that because you use MMM
format for months so the above code only works for logs that are within a month. Filtering from Jan
to Apr
or something like that won't work because Jan
> Apr
. You will need to convert those string to datetime
object.
Moreover, if some log records are multiline, you will need to grasp all lines, not just the line started with datetime.
import re
from datetime import datetime
_start = '2020-Oct-12 13:38:57'
_end = '2020-Oct-12 13:57:57'
dt_fmt = '%Y-%b-%d %H:%M:%S'
dt_reg = r'\d{4}-[A-Za-z]{3}-\d{2}'
dt_start = datetime.strptime(_start, dt_fmt)
dt_end = datetime.strptime(_end, dt_fmt)
str_time_len = len(_start)
with open('DIR_extract.log', 'w+') as myfile:
with open('DIRDriver.log','r') as f:
started = False
for line in f:
if re.match(dt_reg, line):
datetime_str = line[:str_time_len]
dt = datetime.strptime(datetime_str, dt_fmt)
if not started and dt >= dt_start:
started = True
elif started and dt > dt_end:
break
if not started:
continue
myfile.write(line)
print(line.strip())
Assume the log file content is as below:
2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside
It gives you:
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end
Upvotes: 1