Athafoud
Athafoud

Reputation: 3000

regex match partial or whole word

I am trying to figure out a regular expression which can match either the whole word or a (predefined in length, e.g first 4 chars) part of the word.

For example, if I am trying to match the word between and my offset is set to 4, then

between betwee betwe betw

are matches, but not the

bet betweenx bet12 betw123 beta

I have created an example in regex101, where I am trying (with no luck) a combination of positive lookahead (?=) and a non-word boundary \B.

I found a similar question which proposes a word around in its accepted answer. As I understand, it overrides the matcher somehow, to run all the possible regular expressions, based on the word and an offset.

My code has to be written in C#, so I am trying to convert the aforementioned code. As I see Regex.Replace (and I assume Regex.Match also) can accept delegates to override the default functionality, but I can not make it work.

Upvotes: 1

Views: 7949

Answers (3)

Phate01
Phate01

Reputation: 1795

You can use this regex

\b(bet(?:[^\s]){1,4})\b

And replace bet and the 4 dynamically like this:

public static string CreateRegex(string word, int minLen)
{
    string token = word.Substring(0, minLen - 1);
    string pattern = @"\b(" + token + @"(?:[^\s]){1," + minLen + @"})\b";

    return pattern;
}

Here's a demo: https://regex101.com/r/lH0oL2/1

EDIT: as for the bet1122 match, you can edit the pattern this way:

\b(bet(?:[^\s0-9]){1,4})\b

If you don't want to match some chars, just enqueue them into the [] character class.

Demo: https://regex101.com/r/lH0oL2/2
For more info, see http://www.regular-expressions.info/charclass.html

Upvotes: 1

Domysee
Domysee

Reputation: 12846

You could take the first 4 characters, and make the remaining ones optional.
Then wrap these in word boundaries and parenthesis.

So in the case of "between", it would be

@"\b(betw)(?:(e|ee|een)?)\b"

The code to achieve that would be:

public string createRegex(string word, int count)
{
    var mandatory = word.Substring(0, count);
    var optional = "(" + String.Join("|", Enumerable.Range(1, count - 1).Select(i => word.Substring(count, i))) + ")?";
    var regex = @"\b(" + mandatory + ")(?:" + optional + @")\b";
    return regex;
}

Upvotes: 3

Ferit
Ferit

Reputation: 9657

The code in the answer you mentioned simply builds up this:

betw|betwe|betwee|between

So all you need is to write a function, to build up a string with a substrings of given word given minimum length.

static String BuildRegex(String word, int min_len)
    {
        String toReturn = "";
        for(int i = 0; i < word.Length - min_len +1; i++)
        {
            toReturn += word.Substring(0, min_len+i);
            toReturn += "|";
        }

        toReturn = toReturn.Substring(0, toReturn.Length-1);

        return toReturn;
    }

Demo

Upvotes: 1

Related Questions