Sebastian Schumann
Sebastian Schumann

Reputation: 3446

Regex: Multiple captures in multiple captures

I've a regular expression that works perfect.

^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$

My Input string looks like this:

SENT KV L1 123 1 2 3 L2 456 4 5 6

The only question is: How to get the context of all captures of group "samplingpoint"?

This group contains 6 captures but I need the context information too. There are three captures in the first capture of group "singlelinedata" and three in the second capture. How to get this information?

The capture of a group doesn't contain a property containing all captures of contained groups.

I know that I can write a single regex to match the whole string an perform a second regex to parse all "singlelinedata"-captures.

I'm looking for a way that works with the specified regex.

Hope someone can help me.

Upvotes: 0

Views: 432

Answers (4)

Markus Jarderot
Markus Jarderot

Reputation: 89231

void Main()
{
    string data = @"SENT KV L1 123 1 2 3 L2 456 4 5 6";
    Parse(data).Dump();
}

public class Result
{
    public int Line;
    public int MeasureLine;
    public List<int> SamplingPoints;
}

private Regex pattern = new Regex(@"^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$", RegexOptions.Multiline);

public IEnumerable<Result> Parse(string data)
{
    foreach (Match m in pattern.Matches(data))
    {
        foreach (Capture c1 in m.Groups["singlelinedata"].Captures)
        {
            int lineStart = c1.Index;
            int lineEnd = c1.Index + c1.Length;

            var result = new Result();
            result.Line = int.Parse(m.Groups["line"].CapturesWithin(c1).First().Value);
            result.MeasureLine = int.Parse(m.Groups["measureline"].CapturesWithin(c1).First().Value);

            result.SamplingPoints = new List<int>();
            foreach (Capture c2 in m.Groups["samplingpoint"].CapturesWithin(c1))
            {
                result.SamplingPoints.Add(int.Parse(c2.Value));
            }

            yield return result;
        }
    }
}

public static class RegexExtensions
{
    public static IEnumerable<Capture> CapturesWithin(this Group group, Capture capture)
    {
        foreach (Capture c in group.Captures)
        {
            if (c.Index < capture.Index) continue;
            if (c.Index >= capture.Index + capture.Length) break;

            yield return c;
        }
    }
}

Edit: Rewritten as an extension method on Group.

Upvotes: 0

Sebastian Schumann
Sebastian Schumann

Reputation: 3446

Based on the answer of Markus Jarderot I wrote an extension method for groups that takes a capture and returns all captures of that group within the specified capture.

The extension method looks like this:

    public static IEnumerable<Capture> CapturesWithin(this Group source, Capture captureContainingGroup)
    {
        var lowerIndex = captureContainingGroup.Index;
        var upperIndex = lowerIndex + captureContainingGroup.Length - 1;

        foreach (var capture in source.Captures.Cast<Capture>())
        {
            if (capture.Index < lowerIndex)
            {
                continue;
            }

            if (capture.Index > upperIndex)
            {
                break;
            }

            yield return capture;
        }
    }

Usage of this method:

foreach (var capture in match.Groups["singlelinedata"].Captures.Cast<Capture>())
{
    var samplingpoints = match.Groups["samplingpoint"].CapturesWithin(capture).ToList();
    ...

Upvotes: 0

David Ewen
David Ewen

Reputation: 3732

One way without doing lots of index matching and keeping a single regex is to change the capture groups to all have the same name. The nested captures actually get pushed onto the stack first so you end up with an array like this:

["1", "123", "1", "2", "3", "L1 123 1 2 3", "2", "456", "4", "5", "6", "L2 456 4 5 6"]

Then it's just a matter of some LINQ craziness to split the result into groups when a capture containing an L is found and then pulling out the data from each group.

var regex = new Regex(@"^SENT KV(?<singlelinedata> L(?<singlelinedata>[1-9]\d*) (?<singlelinedata>\d+)(?: (?<singlelinedata>\d+))+)+$");
var matches = regex.Matches("SENT KV L1 123 1 2 3 L2 456 4 5 6 12 13 L3 789 7 8 9 10");
var singlelinedata = matches[0].Groups["singlelinedata"];

string groupKey = null;
var result = singlelinedata.Captures.OfType<Capture>()
    .Reverse()
    .GroupBy(key => groupKey = key.Value.Contains("L") ? key.Value : groupKey, value => value.Value)
    .Reverse()
    .Select(group => new { key = group.Key, data = group.Skip(1).Reverse().ToList() })
    .Select(item => new { line = item.data.First(), measureline = item.data.Skip(1).First(), samplingpoints = item.data.Skip(2).ToList() })
    .ToList();

Upvotes: 0

Eli Arbel
Eli Arbel

Reputation: 22749

There's no concept of "subgroups" in the regex API. A group can have multiple captures, but you can't know which samplingpoint belongs to which line.

You only option is to use the character index to calculate it yourself.

Upvotes: 0

Related Questions