Dennis
Dennis

Reputation: 1848

Regex capture groups with specific word

Sample text:

begin
more text 
art
  id:213213
  code:"XXX"
  name:234
art-
art
  id:543
  name:72
  code:"AAA"
art-
art
  code:"XXX"
  id:32
  name:46
art-
art
  code:"CCC"
  id:8765
art-
art
  id:876
  code:"DDD"
art-
even more text
even more text
end

Target:

Trying to get the groups starting with art ending with art- where the group contains "XXX" .

So i want

art
  id:213213
  code:"XXX"
  name:234
art-

and

art
  code:"XXX"
  id:32
  name:46
art-

Started with regex101 but did not get far.

Tried:

(?sm)(.*?)(?:art.*?art-)(.*?)

And

(?sm)(.*?)(?:art.*?"XXX".*?art-)(.*?)

Any help would be appreciated.

Upvotes: 0

Views: 736

Answers (2)

The fourth bird
The fourth bird

Reputation: 163287

You can match all lines that do not start with art- or contain code:"XXX: and then match the specific line.

This will

^art(?:\r?\n(?!art-|\s*code:"XXX").*)*\r?\n\s*code:"XXX"(?:\r?\n(?!art-).*)*\r?\nart-

The pattern matches

  • ^ Start of string
  • art Match art
  • (?:\r?\n(?!art-|\s*code:"XXX").*)* Match all lines that do not start with art- or contain code:"XXX"
  • \r?\n\s*code:"XXX" Match a newline and match the line with code:"XXX"
  • (?:\r?\n(?!art-).*)* Continue matching all lines that do not start with art-
  • \r?\nart- Match a newline and art-

Regex demo

There is no language tagged, but for example with JavaScript

const regex = /^art(?:\r?\n(?!art-|\s*code:"XXX").*)*\r?\n\s*code:"XXX"(?:\r?\n(?!art-).*)*\r?\nart-/gm;
const str = `begin
more text 
art
  id:213213
  code:"XXX"
  name:234
art-
art
  id:543
  name:72
  code:"AAA"
art-
art
  code:"XXX"
  id:32
  name:46
art-
art
  code:"CCC"
  id:8765
art-
art
  id:876
  code:"DDD"
art-
even more text
even more text
end`;
let m;

console.log(str.match(regex));

Upvotes: 1

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

We can do it using Tempered Greedy token. Try this regex:

art(?:(?!art-)[\s\S])*code:"XXX"(?:(?!art-)[\s\S])*art-

Click for Demo

Expanation:

  • art - matches art
  • (?:(?!art-)[\s\S])* - matches 0+ occurrences of any character which is not starting with text art-. In short, it matches anything until it finds next occurrence of art-
  • code:"XXX" - matches code:"XXX"
  • (?:(?!art-)[\s\S])* - again matches 0+ occurrences of any character which is not starting with text art-
  • art- - matches art-

Upvotes: 3

Related Questions