Reputation: 37
i am trying to get the string of a "start" and "end" of a file. I am somewhat successful but the isssue i am having is that its duplicating it in my output file. I am most certain its because of that third for loop. I need that third for loop because i have two "ends" that i am iterating through. Is there a solution where I can iterate through the "end" keys without duplicating the writing to my output file?
file 1:
TEXT NOT NEEDED
12345-67897
[more text here]
AMOUNT POSTED: $43,000.00
Text not needed
file 2:
TEXT NOT NEEDED
12345-67897
[more text here]
N= None Billable, B= Billable
TEXT NOT NEEDED
code:
start_key = '12345-67897'
end_key = ['AMOUNT POSTED: $43,000.00', 'N= None Billable, B= Billable']
input_fp = ['C:\User\inputfilepath.txt', 'C:\User\inputfilepath2.txt']
output_fp = ['C:\User\outputfilepath.txt', 'C:\User\outputfilepath2.txt']
for fp, ofp in zip(input_fp,output_fp):
with open(fp, 'r') as file, open(ofp, 'w') as ofp:
parsing = False
for line in file:
for ek in end_key:
if start_key in line.strip():
parsing = True
if ek in line.strip():
parsing = False
if parsing:
ofp.write(line)
current output:
File 1:
12345-67897
12345-67897
[more text here]
[more text here]
AMOUNT POSTED: $43,000.00
AMOUNT POSTED: $43,000.00
File 2:
12345-67897
12345-67897
[more text here]
[more text here]
N= None Billable, B= Billable
N= None Billable, B= Billable
Upvotes: 0
Views: 260
Reputation: 61
You could use regex to find and extract the text, then just get the result from regex and save in the other file:
import re
start_key = '12345-67897\n'
end_key = 'AMOUNT POSTED: \$43,000\.00'
input_fp = 'test.txt'
output_fp = 'test2.txt'
with open(input_fp, 'r') as file, open(output_fp, 'w') as ofp:
text = file.read()
result = re.search(r'(?<=%s)[\S\s]*(?=%s)' % (start_key, end_key), text)
if result:
print(result.group(0))
Output:
[more text here]
The (?<=12345-67897\n)[\S\s]*(?=AMOUNT POSTED: \$43,000\.00)
will find any text between those 2 strings for you, the full explanation for this regex can be found here on regexr. Attention to the backward slashes to supress the special meaning of those characters
Upvotes: 0
Reputation: 111
In your code, i found that 3rd loop makes call write function twice!
I recommends you to split condition for your conditional loop like below.
start_key = '12345-67897'
end_key = ['AMOUNT POSTED: $43,000.00', 'N= None Billable, B= Billable']
input_fp = ['./inputfilepath.txt', './inputfilepath2.txt']
output_fp = ['./outputfilepath.txt', './outputfilepath2.txt']
for fp, ofp in zip(input_fp,output_fp):
with open(fp, 'r') as file, open(ofp, 'w') as ofp:
parsing = False
for line in file:
line_striped = line.strip()
# Starting condition
if start_key in line_striped:
parsing = True
# Writing condition
if parsing:
ofp.write(line)
# Stop condition
if any([ek in line_striped for ek in end_key]):
break
output outputfilepath.txt
12345-67897
[more text here]
AMOUNT POSTED: $43,000.00
output outputfilepath2.txt
12345-67897
[more text here]
N= None Billable, B= Billable
Upvotes: 0
Reputation: 23738
Need to check if start key is in the input line before looping over the end tags. Add one check for parsing flag is true after the loop. This will ensure line is output once. Also, don't need to strip the line input each time.
Try this:
for fp, ofp in zip(input_fp, output_fp):
with open(fp, 'r') as file, open(ofp, 'w') as ofp:
parsing = False
for line in file:
if start_key in line:
parsing = True
for ek in end_key:
if ek in line:
parsing = False
break
if parsing:
ofp.write(line)
Upvotes: 2