Include all lines in between the first and last occurrence

Question

I have a txt file which has text in this manner:

    [2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
    [2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"
    [2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good. 
     How about you?"
    [2018-07-11 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!

    Thank you.
    [2018-07-12 14:05:20] SYSTEM RESPONSE: "Hello!"
    How is your day going today?
    [2018-07-12 14:05:34] USER INPUT (xvp_dev-0): "Great! Can't complain"
    [2018-07-12 14:05:34] SYSTEM RESPONSE: "Okay. 
    That's good"

Now, I want all the lines from the first occurrence of [2018-07-11] to the last, and all the line in between. Currently, I am just finding all the lines that start with [2018-07-11.. and displaying them, but if you notice, there are few lines which are in between them too which are getting lost.

for line in file:
    if b in line: #b = system input of date
       x = x + "//" + line[11:]
    else:
       x=x

Sample output would be something like: For the date 2018-11-17:

20:57:08] SYSTEM RESPONSE: "hello"
20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"
20:57:19] SYSTEM RESPONSE: "It's going pretty good. 
How about you?"
14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!
Thank you.

for the date: 2018-07-12:

14:05:20] SYSTEM RESPONSE: "Hello!"
How is your day going today?
14:05:34] USER INPUT (xvp_dev-0): "Great! Can't complain"
14:05:34] SYSTEM RESPONSE: "Okay. 
That's good"

Any idea on how I would be able to get the lines in between too? Since it all depends on dates- there is no way an occurrence of a that can happen later on in the text.

Andrej Kesely · Accepted Answer

You can use regular expressions to parse the lines. I made a function find_lines_by_date() where you can supply the date string and it will return a list of lines with this date:

data = """
    [2018-07-11 20:57:08] SYSTEM RESPONSE: "hello"
    [2018-07-11 20:57:19] USER INPUT (xvp_dev-0): "hi! how is it going?"
    [2018-07-11 20:57:19] SYSTEM RESPONSE: "It's going pretty good.
     How about you?"
    [2018-07-11 14:05:20] USER INPUT (xvp_dev-0): I've been doing good too!

    Thank you.
    [2018-07-12 14:05:20] SYSTEM RESPONSE: "Hello!"
    How is your day going today?
    [2018-07-12 14:05:34] USER INPUT (xvp_dev-0): "Great! Can't complain"
    [2018-07-12 14:05:34] SYSTEM RESPONSE: "Okay.
    That's good"
"""

import re
import pprint

def find_lines_by_date(date='2018-07-11'):
    rv = []
    groups = re.findall(r'($$(.*?)\s+.*?$$[^$$]+)', data)
    for g in groups:
        if g[-1] == date:
            rv.append(g[0].strip())
    return rv


pprint.pprint(find_lines_by_date(date='2018-07-12'))

This will print:

['[2018-07-12 14:05:20] SYSTEM RESPONSE: "Hello!"\n'
 '    How is your day going today?',
 '[2018-07-12 14:05:34] USER INPUT (xvp_dev-0): "Great! Can\'t complain"',
 '[2018-07-12 14:05:34] SYSTEM RESPONSE: "Okay.\n    That\'s good"']

EDIT:

The regexp (\[(.*?)\s+.*?$$[^\[]+) will match the string for all two-valued groups (first value in the group contains all the line for return value, second value in the group is the date for comparison).

I made a simple example on external site with detailed explication:

Include all lines in between the first and last occurrence

Answers (2)

Related Questions