Reputation: 1068
So I have this one string, which contains multiple occurrences of a substring. All of these strings have the following format: <c@=someText>Content<c>
Example:
This combination of plain text and <c=@flavor> colored text<c> is valid. <c=@warning>Multiple tags are also valid.<c>
I want to extract each of the substrings via regex. However if I use the following regex <c=@.+?(?=>)>.*<c>
It matches everything from the first <c...
to the last <c>
. What I want is each of those substrings as one item. How can I do this and if I can't do it with regex, what would be the best way to achieve my goal.
Upvotes: 0
Views: 2658
Reputation: 15364
string input = @"This combination of plain text and <c=@flavor> colored text<c> is valid. <c=@warning>Multiple tags are also valid.<c>";
var matches = Regex.Matches(input, @"<c=@(.+?)>(.+?)<c>")
.Cast<Match>()
.Select(m => new
{
Name = m.Groups[1].Value,
Value = m.Groups[2].Value
})
.ToList();
Upvotes: 1
Reputation: 25370
You can use named capture groups, along with lookaheads and lookbehinds, to grab the 'type' and 'text':
var pattern = @"(?<=<c=@)(?<type>[^>]+)>(?<text>.+?)(?=<c>)";
var str = @"This combination of plain text and <c=@flavor> colored text<c> is valid. <c=@warning>Multiple tags are also valid.<c>";
foreach (Match match in Regex.Matches(str, pattern))
{
Console.WriteLine(match.Groups["type"].Value);
Console.WriteLine(match.Groups["text"].Value);
Console.WriteLine();
}
output:
flavor
colored text
warning
Multiple tags are also valid.
pattern:
(?<=<c=@) :
Look for <c=@
(?<type>[^>]+)> :
Grab everything until a >
, call it type
(?<text>.+?) :
Grab everything until the lookahead, call it text
(?=<c>) :
Stop when you find a <c>
Upvotes: 1