michen00
michen00

Reputation: 853

Most efficient regex for checking if a string contains at least 3 alphanumeric characters

I have this regex:

(?:.*[a-zA-Z0-9].*){3}

I use it to see if a string has at least 3 alphanumeric characters in it. It seems to work.

Examples of strings it should match:

'a3c'
'_0_c_8_'
' 9 9d '

However, I need it to work faster. Is there a better way to use regex to match the same patterns?


Edit: I ended up using this regex for my purposes:

(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}

(no modifiers needed)

Upvotes: 7

Views: 1706

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

The most efficient regex approach is to use the principle of contrast, i.e. using opposite character classes side by side. Here is a regex that can be used to check if a string has 3 Latin script letters or digits:

^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}

See demo.

In case you need a full string match, you will need to append .* (or .*$ if you want to guarantee you will match all up to the end of string/line), but in my tests on regexhero, .* yields better performance):

^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}.*

Also, a lot depends on the engine. PCRE has auto-optimizations in place that consists in auto-possessification (i.e. it turns the * to *+ in (?:[^a-zA-Z0-9]*+).

See more details on password validation optimizations here.

Upvotes: 6

paxdiablo
paxdiablo

Reputation: 881563

Just consider this. Regular expressions are powerful because they're expressive and very flexible (with features such as look-ahead, greedy consumption and back-tracking). There will almost always be a cost to that, however minor.

If you want raw speed (and you're willing to give up the expressiveness), you may find that it's faster to bypass regular expressions altogether and just evaluate the string, such as with the following pseudo-code:

def hasThreeAlphaNums(str):
    alphanums = 0
    for pos = 0 to len(str) - 1:
        if str[pos] in set "[a-zA-Z0-9]":
            alphanums++
            if alphanums == 3:
                return true
    return false

It's a parser (a very simple one in this case), a tool that can be even more powerful than regular expressions. For a more concrete example, consider the following C code:

#include <ctype.h>
int hasThreeAlphaNums (char *str) {
    int count = 0;
    for (int ch = *str; ch != '\0'; str++)
        if (isalnum (ch))
            if (++count == 3)
                return 1;
    return 0;
}

Now, as to whether or not that's faster for this specific case, that depends on many factors, such as whether the language is interpreted or compiled, how efficient the regex is under the covers, and so on.

That's why the mantra of optimisation is "Measure, don't guess!" You should evaluate the possibilities in your target environment.

Upvotes: 2

vks
vks

Reputation: 67968

(?:.*?[a-zA-Z0-9]){3}.*

You can use this.This is much faster and takes much lesser steps than yours.See demo.You probably would want to use ^$ anchors too to make sure there are no partial matches.

https://regex101.com/r/nS2lT4/32

The reason is

(?:.*[a-zA-Z0-9].*){3}

                ^^

This actually consumes the whole string and then engine has to backtrack.When using the other regex this is avoided

Upvotes: 3

Related Questions