srkdb
srkdb

Reputation: 815

Export data from log file between time range in Python

I have been trying to follow the guidelines in this post with some tweaks: The best way to filter a log by a dates range in python

My log file is like:

2020-Oct-12 13:38:57.742759 -0700 : (some text)
(line of text)
(line of text)
2020-Oct-12 13:38:57.742760 -0700 : (some text)
...
2020-Oct-12 13:57:57.742759 -0700 : (some text)

I tried these two code snippets but they don't give anything. Is there something wrong with the date-time definition?

myfile = open('DIR_extract.log', 'w')
with open('DIRDriver.log','r') as f:
    for line in f:
        d = line.split(" ",1)[0] 
        if d >= '2020-10-12 13:38:57' and d <= '2020-10-12 13:57:57':
            myfile.write("%s\n" % line)

and also

myfile = open('DIR_extract.log', 'w')
from itertools import dropwhile, takewhile
from_dt, to_td = '2020-10-12 13:38:57', '2020-10-12 13:57:57'
with open('DIRDriver.log') as fin:
    of_interest = takewhile(lambda L: L <= to_td, dropwhile(lambda L: L < from_dt, fin))
    for line in of_interest:
        myfile.write("%s\n" % line)

Upvotes: 0

Views: 1059

Answers (1)

dragon2fly
dragon2fly

Reputation: 2419

You almost got there.

d = line.split(" ",1)[0] only return the first part of the datetime, eg: 2020-Oct-12. That is because your datetime format is different than the answer that you linked to. You have space between date and time.

So to make it work, you need to grasp all date and time part of the line.

dt_start = '2020-Oct-12 13:38:57'
dt_end = '2020-Oct-12 13:57:57'
str_time_len = len(dt_start)

with open('DIR_extract.log', 'w+') as myfile:
    with open('DIRDriver.log','r') as f:
        for line in f:
            date_time = line[:str_time_len]
            if dt_start <= date_time <= dt_end:
                myfile.write(line)

Assumed that the log file content is

2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside

The code above gives

2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end

Note that because you use MMM format for months so the above code only works for logs that are within a month. Filtering from Jan to Apr or something like that won't work because Jan > Apr. You will need to convert those string to datetime object.

Moreover, if some log records are multiline, you will need to grasp all lines, not just the line started with datetime.

import re
from datetime import datetime

_start = '2020-Oct-12 13:38:57'
_end = '2020-Oct-12 13:57:57'


dt_fmt = '%Y-%b-%d %H:%M:%S'
dt_reg = r'\d{4}-[A-Za-z]{3}-\d{2}'
dt_start = datetime.strptime(_start, dt_fmt)
dt_end = datetime.strptime(_end, dt_fmt)

str_time_len = len(_start)

with open('DIR_extract.log', 'w+') as myfile:
    with open('DIRDriver.log','r') as f:
        started = False
        for line in f:
            if re.match(dt_reg, line):
                datetime_str = line[:str_time_len]
                dt = datetime.strptime(datetime_str, dt_fmt)
                if not started and dt >= dt_start:
                    started = True
                elif started and dt > dt_end:
                    break

            if not started:
                continue

            myfile.write(line)
            print(line.strip())

Assume the log file content is as below:

2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside

It gives you:

2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end

Upvotes: 1

Related Questions