user401
user401

Reputation: 63

PowerShell Regex: Capturing strings between two strings that is on multiple lines

I may have something like this:

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

I only want to capture everything from FIRST to SECOND|B and exclude anything from FIRST to SECOND|A. The order in this post is just an example and may be different with the files I am working with. The text in brackets could be words, digits, special characters, etc. (newline) is just telling you that it is on a different line. I have tried https://regex101.com/r/CwzCyz/2 (FIRST[\s\S]+SECOND\|B) but that gives me from the first FIRST to the last SECOND|B This works in regex101.com but not in my PowerShell ISE application, which I am guessing is because I have the flavor set to PCRE(PHP).

Upvotes: 4

Views: 660

Answers (2)

user12097764
user12097764

Reputation:

FIRST\|(?:(?!SECOND\|[^B])[\S\s])*?SECOND\|B

will not match the FIRST| associated with the SECOND|A (or any non-B)

https://regex101.com/r/e0CG9B/1

Expanded

 FIRST \| 
 (?:
      (?! SECOND \| [^B] )
      [\S\s] 
 )*?
 SECOND \| B

If there is a need for the absolute inner FIRST / SECOND that has to be done a different way :

FIRST\|(?:(?!(?:FIRST|SECOND)\|)[\S\s])*SECOND\|B

https://regex101.com/r/qoT8U1/1

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163257

If FIRST is at the start of the line and SECOND|A or SECOND|B is at the start of the line you could match all following lines that do not start with SECOND\|[AB]

^FIRST.*(?:\r?\n(?!SECOND\|[AB]\b).*)\r?\nSECOND\|B\b.*

In parts

  • ^FIRST.* Start of the line
  • (?: Non capturing group
    • \r?\n(?!SECOND\|[AB]\b) Match a newline, assert not starting with the SECOND part
    • .* Match 0+ times any char except a newline
  • ) Close non capturing group
  • \r?\n Match a newline
  • SECOND\|B\b.* Match the line that starts with SECOND|B

Regex demo

Upvotes: 1

Related Questions