ajburnett34
ajburnett34

Reputation: 37

Getting Text Between Two Strings In a File

i am trying to get the string of a "start" and "end" of a file. I am somewhat successful but the isssue i am having is that its duplicating it in my output file. I am most certain its because of that third for loop. I need that third for loop because i have two "ends" that i am iterating through. Is there a solution where I can iterate through the "end" keys without duplicating the writing to my output file?

file 1:

TEXT NOT NEEDED
12345-67897
[more text here]
AMOUNT POSTED: $43,000.00
Text not needed

file 2:

TEXT NOT NEEDED
12345-67897
[more  text here]
N= None Billable, B= Billable
TEXT NOT NEEDED

code:

start_key = '12345-67897'
end_key = ['AMOUNT POSTED: $43,000.00', 'N= None Billable, B= Billable']
input_fp = ['C:\User\inputfilepath.txt', 'C:\User\inputfilepath2.txt']
output_fp = ['C:\User\outputfilepath.txt', 'C:\User\outputfilepath2.txt']

    for fp, ofp in zip(input_fp,output_fp):
        with open(fp, 'r') as file, open(ofp, 'w') as ofp:
            parsing = False
            for line in file:
                for ek in end_key:
                    if start_key in line.strip():
                        parsing = True
                    if ek in line.strip():
                        parsing = False
                    if parsing:
                        ofp.write(line)

current output:

File 1:

12345-67897
12345-67897
[more text here]
[more text here]
AMOUNT POSTED: $43,000.00
AMOUNT POSTED: $43,000.00

File 2:

12345-67897
12345-67897
[more  text here]
[more  text here]
N= None Billable, B= Billable
N= None Billable, B= Billable

Upvotes: 0

Views: 260

Answers (3)

2Heavy
2Heavy

Reputation: 61

You could use regex to find and extract the text, then just get the result from regex and save in the other file:

import re

start_key = '12345-67897\n'
end_key = 'AMOUNT POSTED: \$43,000\.00'
input_fp = 'test.txt'
output_fp = 'test2.txt'

with open(input_fp, 'r') as file, open(output_fp, 'w') as ofp:
    text = file.read()
    result = re.search(r'(?<=%s)[\S\s]*(?=%s)' % (start_key, end_key), text)
    if result:
        print(result.group(0))

Output:

[more text here]

The (?<=12345-67897\n)[\S\s]*(?=AMOUNT POSTED: \$43,000\.00) will find any text between those 2 strings for you, the full explanation for this regex can be found here on regexr. Attention to the backward slashes to supress the special meaning of those characters

Upvotes: 0

paul kim
paul kim

Reputation: 111

In your code, i found that 3rd loop makes call write function twice!

I recommends you to split condition for your conditional loop like below.

start_key = '12345-67897'
end_key = ['AMOUNT POSTED: $43,000.00', 'N= None Billable, B= Billable']
input_fp = ['./inputfilepath.txt', './inputfilepath2.txt']
output_fp = ['./outputfilepath.txt', './outputfilepath2.txt']

for fp, ofp in zip(input_fp,output_fp):
    with open(fp, 'r') as file, open(ofp, 'w') as ofp:
        parsing = False
        for line in file:

            line_striped = line.strip()

            # Starting condition
            if start_key in line_striped:
                parsing = True
            
            # Writing condition
            if parsing:
                ofp.write(line)

            # Stop condition
            if any([ek in line_striped for ek in end_key]):
                break

output outputfilepath.txt

12345-67897
[more text here]
AMOUNT POSTED: $43,000.00

output outputfilepath2.txt

12345-67897
[more  text here]
N= None Billable, B= Billable

Upvotes: 0

CodeMonkey
CodeMonkey

Reputation: 23738

Need to check if start key is in the input line before looping over the end tags. Add one check for parsing flag is true after the loop. This will ensure line is output once. Also, don't need to strip the line input each time.

Try this:

for fp, ofp in zip(input_fp, output_fp):
    with open(fp, 'r') as file, open(ofp, 'w') as ofp:
        parsing = False
        for line in file:
            if start_key in line:
                parsing = True
            for ek in end_key:
                if ek in line:
                    parsing = False
                    break
            if parsing:
                ofp.write(line)

Upvotes: 2

Related Questions