lpkej
lpkej

Reputation: 485

Parse log between datetime range using Python

I'm trying to make a dynamic function: I give two datetime values and it could read the log between those datetime values, for example:

    start_point = "2019-04-25 09:30:46.781"
    stop_point =  "2019-04-25 10:15:49.109"

I'm thinking of algorithm that checks:

  1. if the dates are equal:
    • check if the start hour 0 char (09 -> 0) is higher or less than stop hour 0 char (10 -> 1);
    • same check with the hour 1 char ((start) 09 -> 9, (stop) 10 -> 0);
    • same check with the minute 0 char;
    • same check with the minute 1 char;
  2. if the dates differ:
    • some other checks...

I don't know if I'm not inventing a wheel again, but I'm really lost, I'll list things I tried:

1.

    ...
    cmd = subprocess.Popen(['egrep "2019-04-19 ([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}" file.log'], shell=True, stdout=subprocess.PIPE)
    cmd_result = cmd.communicate()[0]
    for i in str(cmd_result).split("\n"):
        print(i)
    ...

The problem with this one: I added the values from the example and it couldn't work, because it has invalid ranges like hour 1 chars it creates range [9-0], minute char 0 as well [3-1] and etc.

2. Tried the following solutions from The best way to filter a log by a dates range in python

Any help is appreciated.

EDIT

the log line structure:

    ...
    2019-04-25 09:30:46.781 text text text ...
    2019-04-25 09:30:46.853 text text text ...
    ...

EDIT 2

So I tried the code:

from datetime import datetime as dt

s1 = "2019-04-25 09:34:11.057"
s2 = "2019-04-25 09:59:43.534"

start = dt.strptime('2019-04-25 09:34:11.057','%Y-%m-%d %H:%M:%S.%f')
stop = dt.strptime('2019-04-25 09:59:43.534', '%Y-%m-%d %H:%M:%S.%f')

start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')

with open('file.out','r') as file:
    for line in file:
        ts = dt.strptime(line.split()[1],'%H:%M:%S.%f')
        if (ts > start_1) and (ts < stop_1):
            print line

and I got the error

ValueError: time data 'Platform' does not match format '%H:%M:%S.%f'

So it seems I found the other problem it contains sometimes non datetime at line start. Is there a way to provide a regex in which I provide the datetime format?

EDIT 3

Fixed the issue when the string appears at the start of the line which causes ValueError and fixed index out of range error when maybe the other values occur:

try:
    ts = dt.strptime(line.split()[1],'%H:%M:%S.%f')
    if (ts > start_1) and (ts < stop_1):
        print line
except IndexError as err:
    continue
except ValueError as err:
    continue

So now it lists not in the range I provide, now it read the log FROM 2019-02-27 09:38:46.229TO 2019-02-28 09:57:11.028. Any thoughts?

Upvotes: 1

Views: 1440

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

Your edit 2 had the right idea. You need to put exception handling in to catch lines which are not formatted correctly and skip them, for example blank lines, or lines that do not have the timestamp. This can be done as follows:

from datetime import datetime

s1 = "2019-04-25 09:24:11.057"
s2 = "2019-04-25 09:59:43.534"

fmt = '%Y-%m-%d %H:%M:%S.%f'

start = datetime.strptime(s1, fmt)
stop = datetime.strptime(s2, fmt)


with open('file.out', 'r') as file:
    for line in file:
        line = line.strip()
        
        try:
            ts = datetime.strptime(' '.join(line.split(' ', maxsplit=2)[:2]), fmt)
            
            if start <= ts <= stop:
                print(line)
                
        except:
            pass

The whole of the timestamp is used to create ts, this was so it can be correctly compared with start and stop.

Each line first has the trailing newline removed. It is then split on spaces up to twice. The first two splits are then joined back together and converted into a datetime object. If this fails, it implies that you do not have a correctly formatted line.

Upvotes: 2

Related Questions