Reputation: 3446
I've a regular expression that works perfect.
^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$
My Input string looks like this:
SENT KV L1 123 1 2 3 L2 456 4 5 6
The only question is: How to get the context of all captures of group "samplingpoint"?
This group contains 6 captures but I need the context information too. There are three captures in the first capture of group "singlelinedata" and three in the second capture. How to get this information?
The capture of a group doesn't contain a property containing all captures of contained groups.
I know that I can write a single regex to match the whole string an perform a second regex to parse all "singlelinedata"-captures.
I'm looking for a way that works with the specified regex.
Hope someone can help me.
Upvotes: 0
Views: 432
Reputation: 89231
void Main()
{
string data = @"SENT KV L1 123 1 2 3 L2 456 4 5 6";
Parse(data).Dump();
}
public class Result
{
public int Line;
public int MeasureLine;
public List<int> SamplingPoints;
}
private Regex pattern = new Regex(@"^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$", RegexOptions.Multiline);
public IEnumerable<Result> Parse(string data)
{
foreach (Match m in pattern.Matches(data))
{
foreach (Capture c1 in m.Groups["singlelinedata"].Captures)
{
int lineStart = c1.Index;
int lineEnd = c1.Index + c1.Length;
var result = new Result();
result.Line = int.Parse(m.Groups["line"].CapturesWithin(c1).First().Value);
result.MeasureLine = int.Parse(m.Groups["measureline"].CapturesWithin(c1).First().Value);
result.SamplingPoints = new List<int>();
foreach (Capture c2 in m.Groups["samplingpoint"].CapturesWithin(c1))
{
result.SamplingPoints.Add(int.Parse(c2.Value));
}
yield return result;
}
}
}
public static class RegexExtensions
{
public static IEnumerable<Capture> CapturesWithin(this Group group, Capture capture)
{
foreach (Capture c in group.Captures)
{
if (c.Index < capture.Index) continue;
if (c.Index >= capture.Index + capture.Length) break;
yield return c;
}
}
}
Edit: Rewritten as an extension method on Group
.
Upvotes: 0
Reputation: 3446
Based on the answer of Markus Jarderot I wrote an extension method for groups that takes a capture and returns all captures of that group within the specified capture.
The extension method looks like this:
public static IEnumerable<Capture> CapturesWithin(this Group source, Capture captureContainingGroup)
{
var lowerIndex = captureContainingGroup.Index;
var upperIndex = lowerIndex + captureContainingGroup.Length - 1;
foreach (var capture in source.Captures.Cast<Capture>())
{
if (capture.Index < lowerIndex)
{
continue;
}
if (capture.Index > upperIndex)
{
break;
}
yield return capture;
}
}
Usage of this method:
foreach (var capture in match.Groups["singlelinedata"].Captures.Cast<Capture>())
{
var samplingpoints = match.Groups["samplingpoint"].CapturesWithin(capture).ToList();
...
Upvotes: 0
Reputation: 3732
One way without doing lots of index matching and keeping a single regex is to change the capture groups to all have the same name. The nested captures actually get pushed onto the stack first so you end up with an array like this:
["1", "123", "1", "2", "3", "L1 123 1 2 3", "2", "456", "4", "5", "6", "L2 456 4 5 6"]
Then it's just a matter of some LINQ craziness to split the result into groups when a capture containing an L is found and then pulling out the data from each group.
var regex = new Regex(@"^SENT KV(?<singlelinedata> L(?<singlelinedata>[1-9]\d*) (?<singlelinedata>\d+)(?: (?<singlelinedata>\d+))+)+$");
var matches = regex.Matches("SENT KV L1 123 1 2 3 L2 456 4 5 6 12 13 L3 789 7 8 9 10");
var singlelinedata = matches[0].Groups["singlelinedata"];
string groupKey = null;
var result = singlelinedata.Captures.OfType<Capture>()
.Reverse()
.GroupBy(key => groupKey = key.Value.Contains("L") ? key.Value : groupKey, value => value.Value)
.Reverse()
.Select(group => new { key = group.Key, data = group.Skip(1).Reverse().ToList() })
.Select(item => new { line = item.data.First(), measureline = item.data.Skip(1).First(), samplingpoints = item.data.Skip(2).ToList() })
.ToList();
Upvotes: 0
Reputation: 22749
There's no concept of "subgroups" in the regex API. A group can have multiple captures, but you can't know which samplingpoint
belongs to which line
.
You only option is to use the character index to calculate it yourself.
Upvotes: 0