Rubio
Rubio

Reputation: 1095

Regex - Match one string but capture multiple occurences of another (.NET)

Given a string:

12345XXX3256|221456000|352456345|221324567|221654000|

I want to match if the line contains \|221.{3}000 followed by \|221.{3}(?!000), that is, data group 221 with three zeros as last digits followed by data group 221 without three zeros as last digit. (Pipe symbol | separates data groups.) This I can easily do with the following regex.

^.+\|221.{3}000.*\|221.{3}(?!000)

However, what I want to capture is all occurrences of data group 221 with three zeros as last digit (shown bold below).

Group[0]: |221456000

Group[1]: |221654000

Haven't been able to figure out how to match one thing and capture multiple occurrences of another.

Upvotes: 0

Views: 108

Answers (2)

Douglas
Douglas

Reputation: 54897

var matches = Regex.Matches(s, @"(?:(\|221...000).*?)+\|221...(?!000)...(?:(\|221...000).*?)*");

where

  • (?:(\|221...000).*?)+ will match and individually capture any |221...000 data groups preceding the 221...000, requiring at least one such data group
  • \|221...(?!000)... will match but not capture the 221...000 data group
  • (?:(\|221...000).*?)* will match and individually capture any |221...000 data groups succeeding the 221...000

Update: The above regex captures all |221...000 occurrences preceding the 221...000 into one group, and all the |221...000 occurrences succeeding it into another. If you want to capture them into one group, I would suggest using a named group:

var matches = Regex.Matches(s, @"((?<data>\|221...000).*?)+\|221...(?!000)...((?<data>\|221...000).*?)*");
var captures = matches.Cast<Match>().Select(m => m.Groups["data"].Captures.Cast<Capture>().ToArray()).ToArray();

Upvotes: 1

m.cekiera
m.cekiera

Reputation: 5395

I think the easiest way is to match line which fit your demands, and then use another regex (or other technique) to get desired output. It would be more effective than regex alone.

However if you want use just regex, try something like:

(?=(^.*?221\d{3}000.*?221\d{3}[1-9]{3}.+))^\w+|(?<=\G)(221.{3}000)|(?<=\G)\w+|(?<=\G)\|
  • (?=(^.*?221\d{3}000.*?221\d{3}[1-9]{3}.+)) - positive lookahead for your regex
  • ^\w+ - beginning of a line with word characters,
  • | - or
  • (?<=\G)(221.{3}000) - positive lookbehind for previous match, followed by 221.{3}000 part in capturing group
  • | - or
  • (?<=\G)\w+ - positive lookbehind for previous match, followed by any word characters
  • | - or
  • (?<=\G)\| - positive lookbehind for previous match, followed by escaped logical "OR" character

DEMO1
DEMO2

It match separatly different elements of line which fit your demands, but it will also match the (221\d{3}000) part in group 2. So if you want to get all (221\d{3}000) from given line, you would need use ale matches where group 2 was captured, and to compare whole lines, which are captured in group 1.

It should match all (221\d{3}000) no matter how long the line is. However, direct multiple matching by one group is not possible here

Upvotes: 0

Related Questions