Akash
Akash

Reputation: 47

How to extract data from text file between two matched line in python

I have example text file of more that 35000 line in which there is a pattern like, how to write python code to extract data between two lines.

Violator was running
MaxSelect
Modified by Violator
some lines
some more lines
Violator was running
Code
fixed
Modified by Violator

I want to read the file and extract the data between Violator was running and Modified by Violator along with the line code and write these data to new output.txt file. I have the same string pattern of Violator throughout the text file just want to extract the data between them. Please help.

with open('example.txt', 'r') as rf:
   output = rf.readlines()
   s = len(output) - 1
   gen ="Violator was running"
   show = "Modified by Violator"
   for count, line in enumerate(rf,start=1):
      if re.match(gen, line) and re.match(show):
         print(rf.readlines())

This is what I haved tried

Upvotes: 0

Views: 949

Answers (3)

Ali Davood
Ali Davood

Reputation: 101

I think this answer is more clear:

start = 'Violator was running'
end = 'Modified by Violator'
output = []

with open('text.txt') as f:
    lines = [line.rstrip() for line in f]

    for index, string in enumerate(lines):
        if start in string:
            for item in lines[index+1:]:
                if end not in item:
                    output.append(item)
                else:
                    break


with open('output.txt', 'a') as f:
    f.writelines(output)    

Upvotes: 0

Ali
Ali

Reputation: 118

You can loop through the lines to get the indexes of each starting point (Violator was running) and each ending point (Modified by Violator) and then get the lines in between the part of start & end index.

lines = [
"Violator was running",
"MaxSelect",
"Modified by Violator",
"some lines",
"some more lines",
"Violator was running",
"Code",
"fixed",
"Modified by Violator",
]

starts = []
ends = []

for idx, line in enumerate(lines):
    if line == "Violator was running":
        starts.append(idx)
    elif line == "Modified by Violator":
        ends.append(idx)
    else:
        continue

groups = []
for start, end in zip(starts, ends):
    group = lines[start+1:end]
    groups.append(group)
    
print(groups)

Output:

[['MaxSelect'], ['Code', 'fixed']]

Upvotes: 1

Uncle Dino
Uncle Dino

Reputation: 864

For simple tasks, I would recommend regex, but as you mentioned, this file is huge, and we should avoid loading it into memory.

Processing the file line-by-line is easy as others have mentioned, but you need to do the filtering yourself.

Quick - n - dirty solution, but a workable starting point:

with open("file_location") as infile:
   save_line = False
   out_lines = []
   for no, line in enumerate(infile):
      if line == "Violator was running\n":
         save_line = True
      elif line == "Modified by Violator\n":
         save_line = False
      elif save_line:
         out_lines.append(f"Line {no} - '{line[:-1]}'\n")
with open("out_file", "w") as outfile:
   for line in out_lines:
      outfile.write(line)

Upvotes: 0

Related Questions