Reputation: 1095
Given a string:
12345XXX3256|221456000|352456345|221324567|221654000|
I want to match if the line contains \|221.{3}000 followed by \|221.{3}(?!000), that is, data group 221 with three zeros as last digits followed by data group 221 without three zeros as last digit. (Pipe symbol | separates data groups.) This I can easily do with the following regex.
^.+\|221.{3}000.*\|221.{3}(?!000)
However, what I want to capture is all occurrences of data group 221 with three zeros as last digit (shown bold below).
Group[0]: |221456000
Group[1]: |221654000
Haven't been able to figure out how to match one thing and capture multiple occurrences of another.
Upvotes: 0
Views: 108
Reputation: 54897
var matches = Regex.Matches(s, @"(?:(\|221...000).*?)+\|221...(?!000)...(?:(\|221...000).*?)*");
where
(?:(\|221...000).*?)+
will match and individually capture any |221...000
data groups preceding the 221...
000
\|221...(?!000)...
will match but not capture the 221...
000
(?:(\|221...000).*?)*
will match and individually capture any |221...000
data groups succeeding the 221...
000
Update: The above regex captures all |221...000
occurrences preceding the 221...
into one group, and all the 000
|221...000
occurrences succeeding it into another. If you want to capture them into one group, I would suggest using a named group:
var matches = Regex.Matches(s, @"((?<data>\|221...000).*?)+\|221...(?!000)...((?<data>\|221...000).*?)*");
var captures = matches.Cast<Match>().Select(m => m.Groups["data"].Captures.Cast<Capture>().ToArray()).ToArray();
Upvotes: 1
Reputation: 5395
I think the easiest way is to match line which fit your demands, and then use another regex (or other technique) to get desired output. It would be more effective than regex alone.
However if you want use just regex, try something like:
(?=(^.*?221\d{3}000.*?221\d{3}[1-9]{3}.+))^\w+|(?<=\G)(221.{3}000)|(?<=\G)\w+|(?<=\G)\|
(?=(^.*?221\d{3}000.*?221\d{3}[1-9]{3}.+))
- positive lookahead for
your regex ^\w+
- beginning of a line with word characters, |
- or(?<=\G)(221.{3}000)
- positive lookbehind for previous match,
followed by 221.{3}000
part in capturing group |
- or (?<=\G)\w+
- positive lookbehind for previous match, followed by any word
characters |
- or (?<=\G)\|
- positive lookbehind for previous match,
followed by escaped logical "OR" characterIt match separatly different elements of line which fit your demands, but it will also match the (221\d{3}000)
part in group 2. So if you want to get all (221\d{3}000)
from given line, you would need use ale matches where group 2 was captured, and to compare whole lines, which are captured in group 1.
It should match all (221\d{3}000)
no matter how long the line is.
However, direct multiple matching by one group is not possible here
Upvotes: 0