Reputation: 67
I have a text file that has sets of text I need to extract that looks something like as follows:
ITEM A blah blah blah ITEM B bloo bloo bloo ITEM A blee blee blee ITEM B
Here is the working code I have so far:
finda = r'(Item\sA)'
findb = r'(Item\sB)'
match_a = re.finditer(finda, usefile, 2) # the "2" is a flag to say ignore case
match_b = re.finditer(findb, usefile, 2)
I know that I can use commands like span, start, and end to find the text positions of my matches. But I need to do this many times so what I need is:
Thanks a ton in advance! I have been spinning my wheels for a while.
Upvotes: 1
Views: 1573
Reputation: 319929
why not just:
with open(fname, 'w') as file:
for match in re.finditer(r'Item A(.+?)Item B', subject, re.I):
s = match.group(1)
if len(s) > 50:
file.write(s)
Note: using actual numerical values of flags is rather oblique, use provided in re
flags.
Upvotes: 2
Reputation: 336468
This can be done in a single regex:
with open("output.txt", "w") as f:
for match in re.finditer(r"(?<=Item\sA)(?:(?!Item\sB).){50,}(?=Item\sB)", subject, re.I):
f.write(match.group()+"\n")
This matches what is between Item A and Item B. Or did you want to match the delimiters, too?
The regex explained:
(?<=Item\sA) # assert that we start our match right after "Item A"
(?: # start repeated group (non-capturing)
(?!Item\sB) # assert that we're not running into "Item B"
. # then match any character
){50,} # repeat this at least 50 times
(?=Item\sB) # then assert that "Item B" follows next (without making it part of the match)
Upvotes: 2