dandyjuan
dandyjuan

Reputation: 67

How do I extract text between two different matches?

I have a text file that has sets of text I need to extract that looks something like as follows:

ITEM A blah blah blah ITEM B bloo bloo bloo ITEM A blee blee blee ITEM B

Here is the working code I have so far:

finda = r'(Item\sA)'
findb = r'(Item\sB)'
match_a = re.finditer(finda, usefile, 2)  # the "2" is a flag to say ignore case
match_b = re.finditer(findb, usefile, 2)

I know that I can use commands like span, start, and end to find the text positions of my matches. But I need to do this many times so what I need is:

  1. start writing at ITEM A and stop writing at ITEM B.
  2. if that first iteration is less than 50 characters long then discard and move to the next one
  3. once you find a set that starts with ITEM A and ends with ITEM B and is larger than 50 characters write it to a file

Thanks a ton in advance! I have been spinning my wheels for a while.

Upvotes: 1

Views: 1573

Answers (2)

SilentGhost
SilentGhost

Reputation: 319929

why not just:

with open(fname, 'w') as file:
    for match in re.finditer(r'Item A(.+?)Item B', subject, re.I):
        s = match.group(1)
        if len(s) > 50:
            file.write(s)

Note: using actual numerical values of flags is rather oblique, use provided in re flags.

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336468

This can be done in a single regex:

with open("output.txt", "w") as f:
    for match in re.finditer(r"(?<=Item\sA)(?:(?!Item\sB).){50,}(?=Item\sB)", subject, re.I):
        f.write(match.group()+"\n")

This matches what is between Item A and Item B. Or did you want to match the delimiters, too?

The regex explained:

(?<=Item\sA)   # assert that we start our match right after "Item A"
(?:            # start repeated group (non-capturing)
  (?!Item\sB)  # assert that we're not running into "Item B"
  .            # then match any character
){50,}         # repeat this at least 50 times
(?=Item\sB)    # then assert that "Item B" follows next (without making it part of the match)

Upvotes: 2

Related Questions