how to fetch all text between two timestamps with python mutilple times and keeping the timestamp too

Question

I am trying to fetch all the text between two timestamps and this multiple times. Example:

2022/May/31 14:42:33.775887 blabla.
2022/May/31 14:42:33.775907 abc, id 2
2022/May/31 14:42:33.781586 blabla {
one
huge
block
of
text
}
2022/May/31 14:42:33.781982 def

So as you can see it is a log file with many timestamps and some are rather short lines and other have more text in between.

The goal would be to fetch each text between the two timestamps and modify/analyze if needed the text or discard it.

I have one approach but it is not what I want and it causes me problem when analyzing the text.

with open(infile) as fp:
    for result in re.findall("\d{4}/\w{3}/\d{2}(.*?)\}", fp.read(), re.S):
        print(result)

The above is working when the timestamp has a block of text {...} but when the timestamp has no block of text it will go until he finds '}' which would be wrong and for another timestamp.

So, when I run this with my above example it would give me:

2022/May/31 14:42:33.775887 blabla.
2022/May/31 14:42:33.775907 abc, id 2
2022/May/31 14:42:33.781586 blabla {
one
huge
block
of
text
}

I tried also this, but it doesn't give me anything:

with open(infile) as fp:
    for result in re.findall("^\d{4}/\w{3}/\d{2}(.*?)\d{4}/\w{3}/\d{2}", fp.read(), re.S):
        print(result)

Any clue what am I doing wrong?

The approach from 'The fourth bird' is working very beautiful:

pattern = r"^\d{4}/\w{3}/\d{2} (\d\d:\d\d:\d\d\.\d+ )(.*(?:
(?!\d{4}/\w{3}/\d{2}\b).*)*)"
with open(infile) as fp:
    for result in re.findall(pattern, fp.read(), re.M):
        print(result)

This would give me (with the timestamp):

['14:42:33.775887 blabla.', '14:42:33.775907 abc, id 2', '14:42:33.781586 blabla {
one
huge
block
of
text
}', '14:42:33.781982 def']

how to fetch all text between two timestamps with python mutilple times and keeping the timestamp too

Answers (1)

Related Questions