Dvlpr2878
Dvlpr2878

Reputation: 117

c# Regex capturing repeated keyword values

I'm trying to capture the value of a keyword that is delimited by either another keyword or the end of the line with the keywords possibly be repeated, in any order or have no data to capture:

Keywords: K1,K2

Input data: somedatahereornotk1capturethis1k2capturethis2k2capturethis3k1k2

I want the captured data to be

1. capturethis1
2. capturethis2
3. capturethis3
4. 
5.

I've tried k1|k2(?<Data>.*?)k1|k2, but the captured data is always empty.

Thanks!

Upvotes: 3

Views: 416

Answers (3)

Anirudha
Anirudha

Reputation: 32807

string s="somedatahereornotk1capturethis1k2capturethis2k2capturethis3k1k2";

Regex r=new Regex("(?<=k1|k2).*?(?=k1|k2)");
foreach(Match m in r.Matches(s))
Console.WriteLine(m.Value);

Upvotes: 0

Jay
Jay

Reputation: 57939

You are on the right track with the alternations. The missing piece is to use look-behind and look-ahead to assert that something must be preceded and followed by the delimiters.

(?<=k1|k2)(?<Data>.*?)(?=k1|k2)

Lookbehind (?<=…) and lookahead (?=…) are zero-width assertions, so they must be satisfied but do not become part of the match.

Your desire to capture instances of consecutive delimeters is a bit trickier, because you can't really capture "nothing" -- the space between two characters. One approach would be to capture the lookbehind (or lookahead):

(?<=(?<Delimiter>k1|k2))(?<Data>.*?)(?=k1|k2)

This will yield 4 results instead of 3, because it will include the consecutive k1k2 at the end of your sample data. You'll just have to ignore the extra data for each match (k1,k2,k2,k1).

Upvotes: 3

Andrew Cheong
Andrew Cheong

Reputation: 30273

First, be aware that the alternation operator | has low precedence, so

k1|k2(?<Data>.*?)k1|k2

is actually looking for k1 or k2(?<Data>.*?)k1 or k2. Use grouping:

(?:k1|k2)(?<Data>.*?)(?:k1|k2)

Second, consider using the zero-width lookahead and lookbehind assertions:

(?<=k1|k2)(?<Data>.*?)(?=k1|k2)

Upvotes: 3

Related Questions