Reputation: 117
I'm trying to capture the value of a keyword that is delimited by either another keyword or the end of the line with the keywords possibly be repeated, in any order or have no data to capture:
Keywords: K1,K2
Input data: somedatahereornotk1capturethis1k2capturethis2k2capturethis3k1k2
I want the captured data to be
1. capturethis1
2. capturethis2
3. capturethis3
4.
5.
I've tried k1|k2(?<Data>.*?)k1|k2
, but the captured data is always empty.
Thanks!
Upvotes: 3
Views: 416
Reputation: 32807
string s="somedatahereornotk1capturethis1k2capturethis2k2capturethis3k1k2";
Regex r=new Regex("(?<=k1|k2).*?(?=k1|k2)");
foreach(Match m in r.Matches(s))
Console.WriteLine(m.Value);
Upvotes: 0
Reputation: 57939
You are on the right track with the alternations. The missing piece is to use look-behind and look-ahead to assert that something must be preceded and followed by the delimiters.
(?<=k1|k2)(?<Data>.*?)(?=k1|k2)
Lookbehind (?<=…)
and lookahead (?=…)
are zero-width assertions, so they must be satisfied but do not become part of the match.
Your desire to capture instances of consecutive delimeters is a bit trickier, because you can't really capture "nothing" -- the space between two characters. One approach would be to capture the lookbehind (or lookahead):
(?<=(?<Delimiter>k1|k2))(?<Data>.*?)(?=k1|k2)
This will yield 4 results instead of 3, because it will include the consecutive k1k2
at the end of your sample data. You'll just have to ignore the extra data for each match (k1
,k2
,k2
,k1
).
Upvotes: 3
Reputation: 30273
First, be aware that the alternation operator |
has low precedence, so
k1|k2(?<Data>.*?)k1|k2
is actually looking for k1
or k2(?<Data>.*?)k1
or k2
. Use grouping:
(?:k1|k2)(?<Data>.*?)(?:k1|k2)
Second, consider using the zero-width lookahead and lookbehind assertions:
(?<=k1|k2)(?<Data>.*?)(?=k1|k2)
Upvotes: 3