user3749947
user3749947

Reputation: 101

How do I use RegEx to pick longest match?

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,

String pattern = ("(hello|hello world)");
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var matches = regex.Matches("hello world");

Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.

Upvotes: 9

Views: 6291

Answers (3)

hwnd
hwnd

Reputation: 70732

Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.

String pattern = ("(hello world|hello wor|hello)");

Upvotes: 2

gunr2171
gunr2171

Reputation: 17578

Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.

string input = "hello world";

string patternFull = "hello world";
Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase);

var matches = regexFull.Matches(input);

if (matches.Count == 0)
{
    string patternShort = "hello";
    Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase);
    matches = regexShort.Matches(input);
}

At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.

You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).

public bool HasRegexMatchInOrder(string input, params string[] patterns)
{
    foreach (var pattern in patterns)
    {
        Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

        if (regex.IsMatch(input))
        {
            return true;
        }
    }

    return false;
}

string input = "hello world";
bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...);

Upvotes: 0

Amal Murali
Amal Murali

Reputation: 76636

If you already know the lengths of the words beforehand, then put the longest first. For example:

String pattern = ("(hello world|hello)");

The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.

An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.

Upvotes: 9

Related Questions