Reputation: 2398
I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received. Thanks.
Upvotes: 32
Views: 59442
Reputation: 11788
Use a positive look ahead and look behind assertion to match the angle brackets, use .*?
to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection
returned by the Matches()
method.
var regex = new Regex("(?<=<<).*?(?=>>)");
foreach (var match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
Upvotes: 62
Reputation: 626738
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex
where the whole match value is accessed via Match.Value
:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>>
is a regex matching <<
, then capturing any 0 or more chars as few as possible (due to the non-greedy *?
quantifier) into Group 1 and then matching >>
RegexOptions.Singleline
makes .
match newline (LF) chars, too (it does not match them by default)Cast<Match>()
casts the match collection to a IEnumerable<Match>
that you may further access using a lambdaSelect(x => x.Groups[1].Value)
only returns the Group 1 value from the current x
match object.ToList()
or .ToArray()
after Select
.In the demo C# code, string.Join(", ", results)
generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
Upvotes: 4
Reputation: 4526
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/
Upvotes: 0
Reputation: 2514
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Upvotes: 3