Reputation: 23
I have a log which contain a structure like this :
OPEN [multiline content] START
OPEN [multiline content] START
OPEN [multiline content] START
I need to find the number of this occurence in my log, but sometime for weird reason, the log is like this :
OPEN [multiline content] OPEN [multiline content] START
(OPEN must have a START after, but in this case not).
I have basic regex like this : https://regex101.com/r/URPsTG/1
I want to match the part "OPEN ... START", so when the structure is
OPEN [multiline content] OPEN [multiline content] START
The bold part have to match, and the rest have to be ignored.
How to proceed that ?
Thanks !
Upvotes: 2
Views: 39
Reputation: 7616
This (?:(?!^OPEN)OPEN.*?)?
prefix to your existing regex should do it:
/(?:(?!^OPEN)OPEN.*?)?OPEN(.*?)START/gms
Explanation:
(?:(?!^OPEN)OPEN.*?)?
- non-capturing group, optional, for OPEN
not at the beginning of the text, followed by a non-greedy scanUpvotes: 0
Reputation: 163362
You could match OPEN
, followed by all lines that do not start with OPEN
or START
to prevent overmatching.
^OPEN((?:\r?\n(?!(?:OPEN|START)$).*)*)\r?\nSTART
In parts, the pattern matches:
^
Start of stringOPEN
Match literally(
Capture group 1
(?:
Non capture group
\r?\n(?!(?:OPEN|START)$)
Match a newline and assert that is does not start with either OPEN
or START
.*
Match the whole line)*
Close non capture group and optionally repeat to match all lines)
Close group 1\r?\nSTART
Match a newline followed by STARTThere is no language tagged, but if supported you could prevent some backtracking using an atomic group (?>
(or make the quantifier for the outer non capture group possessive *+
)
^OPEN((?>\r?\n(?!(?:OPEN|START)$).*)*)\r?\nSTART
Upvotes: 1
Reputation: 521514
I would use a tempered dot here:
\bOPEN\b(?:(?!\bOPEN\b).)*?\bSTART\b
This regex works by:
\bOPEN\b match an initial "OPEN"
(?:(?!\bOPEN\b).)*? match all content without passing another "OPEN"
until reaching the nearest/first
\bSTART\b "START"
The second step of the above regex uses a "tempered dot" trick. The negative lookahead makes sure that .*
does not pass over another OPEN
, which would mean that it is not the nearest one to the closing START
.
Upvotes: 0