Michel Mechoui
Michel Mechoui

Reputation: 23

Regex : Get specific match for a multiline log

I have a log which contain a structure like this :

OPEN [multiline content] START

OPEN [multiline content] START

OPEN [multiline content] START

I need to find the number of this occurence in my log, but sometime for weird reason, the log is like this :

OPEN [multiline content] OPEN [multiline content] START

(OPEN must have a START after, but in this case not).

I have basic regex like this : https://regex101.com/r/URPsTG/1

I want to match the part "OPEN ... START", so when the structure is

OPEN [multiline content] OPEN [multiline content] START

The bold part have to match, and the rest have to be ignored.

How to proceed that ?

Thanks !

Upvotes: 2

Views: 39

Answers (3)

Peter Thoeny
Peter Thoeny

Reputation: 7616

This (?:(?!^OPEN)OPEN.*?)? prefix to your existing regex should do it:

/(?:(?!^OPEN)OPEN.*?)?OPEN(.*?)START/gms

Explanation:

  • (?:(?!^OPEN)OPEN.*?)? - non-capturing group, optional, for OPEN not at the beginning of the text, followed by a non-greedy scan

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163362

You could match OPEN, followed by all lines that do not start with OPEN or START to prevent overmatching.

^OPEN((?:\r?\n(?!(?:OPEN|START)$).*)*)\r?\nSTART

In parts, the pattern matches:

  • ^ Start of string
  • OPEN Match literally
  • ( Capture group 1
    • (?: Non capture group
      • \r?\n(?!(?:OPEN|START)$) Match a newline and assert that is does not start with either OPEN or START
      • .* Match the whole line
    • )* Close non capture group and optionally repeat to match all lines
  • ) Close group 1
  • \r?\nSTART Match a newline followed by START

Regex demo


There is no language tagged, but if supported you could prevent some backtracking using an atomic group (?> (or make the quantifier for the outer non capture group possessive *+)

^OPEN((?>\r?\n(?!(?:OPEN|START)$).*)*)\r?\nSTART

Regex demo

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521514

I would use a tempered dot here:

\bOPEN\b(?:(?!\bOPEN\b).)*?\bSTART\b

Demo

This regex works by:

\bOPEN\b             match an initial "OPEN"
(?:(?!\bOPEN\b).)*?  match all content without passing another "OPEN"
                     until reaching the nearest/first
\bSTART\b            "START"

The second step of the above regex uses a "tempered dot" trick. The negative lookahead makes sure that .* does not pass over another OPEN, which would mean that it is not the nearest one to the closing START.

Upvotes: 0

Related Questions