Oscar Mederos
Oscar Mederos

Reputation: 29863

Regular Expressions: about Greediness, Laziness and Substrings

I have the following string:

123322

In theory, the regex 1.*2 should match the following:

If I use the regex 1.*2 it matches 123322.
Using 1.*?2, it will match 12.

Is there a way to match 12332 too?

The perfect thing would be to get all possible matchess in the string (no matter if one match is substring of another)

Upvotes: 1

Views: 257

Answers (4)

amcashcow
amcashcow

Reputation: 734

1(.*?2)*$

you will have multiple captures which you can concatenate to form all possible matches

see here:regex tester

click on 'table' and expand the captures tree

Upvotes: 1

Kobi
Kobi

Reputation: 138097

Generally, this isn't possible. A regex matching engine isn't really designed to find overlapping matches. A quick solution is simply to check the pattern on all substrings manually:

string text = "1123322";
for (int start = 0; start < text.Length - 1; start++)
{
    for (int length = 0; length <= text.Length - start; length++)
    {
        string subString = text.Substring(start, length);
        if (Regex.IsMatch(subString, "^1.*2$"))
            Console.WriteLine("{0}-{1}: {2}", start, start + length, subString);
    }
}

Working example: http://ideone.com/aNKnJ

Now, is it possible to get a whole-regex solution? Mostly, the answer is no. However, .Net does has a few tricks in its sleeve to help us: it allows variable length lookbehind, and allows each capturing group to remember all captures (most engines only return the last match of each group). Abusing these, we can simulate the same for loop inside the regex engine:

string text = "1123322!";
string allMatchesPattern = @"
(?<=^       # Starting at the local end position, look all the way to the back
(
  (?=(?<Here>1.*2\G))?  # on each position from the start until here (\G),
  .                     # *try* to match our pattern and capture it,
)*                      # but advance even if you fail to match it.
)
";

MatchCollection matches = Regex.Matches(text, allMatchesPattern,
            RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);
foreach (Match endPosition in matches)
{
    foreach (Capture startPosition in endPosition.Groups["Here"].Captures)
    {
        Console.WriteLine("{0}-{1}: {2}", startPosition.Index,
                          endPosition.Index - 1, startPosition.Value);
    }
}

Note that currently there's a small bug there - the engine doesn't try to match the last ending position ($), so you loose a few matches. For now, adding a ! at the end of the string solves that issue.

working example: http://ideone.com/eB8Hb

Upvotes: 1

user123444555621
user123444555621

Reputation: 153174

You would need a separate expression for each case, depending on the number of twos you want to match:

1(.*?2){1}   #same as 1.*?2
1(.*?2){2}
1(.*?2){3}
...

Upvotes: 1

RDL
RDL

Reputation: 7961

No, unless there is something else added to the regex to clarify what it should do it will either be greedy or non-greedy. There is no in-betweeny ;)

Upvotes: 2

Related Questions