Reputation: 853
I have this regex:
(?:.*[a-zA-Z0-9].*){3}
I use it to see if a string has at least 3 alphanumeric characters in it. It seems to work.
Examples of strings it should match:
'a3c'
'_0_c_8_'
' 9 9d '
However, I need it to work faster. Is there a better way to use regex to match the same patterns?
Edit: I ended up using this regex for my purposes:
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
(no modifiers needed)
Upvotes: 7
Views: 1706
Reputation: 626893
The most efficient regex approach is to use the principle of contrast, i.e. using opposite character classes side by side. Here is a regex that can be used to check if a string has 3 Latin script letters or digits:
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
In case you need a full string match, you will need to append .*
(or .*$
if you want to guarantee you will match all up to the end of string/line), but in my tests on regexhero, .*
yields better performance):
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}.*
Also, a lot depends on the engine. PCRE has auto-optimizations in place that consists in auto-possessification (i.e. it turns the *
to *+
in (?:[^a-zA-Z0-9]*+
).
See more details on password validation optimizations here.
Upvotes: 6
Reputation: 881563
Just consider this. Regular expressions are powerful because they're expressive and very flexible (with features such as look-ahead, greedy consumption and back-tracking). There will almost always be a cost to that, however minor.
If you want raw speed (and you're willing to give up the expressiveness), you may find that it's faster to bypass regular expressions altogether and just evaluate the string, such as with the following pseudo-code:
def hasThreeAlphaNums(str):
alphanums = 0
for pos = 0 to len(str) - 1:
if str[pos] in set "[a-zA-Z0-9]":
alphanums++
if alphanums == 3:
return true
return false
It's a parser (a very simple one in this case), a tool that can be even more powerful than regular expressions. For a more concrete example, consider the following C code:
#include <ctype.h>
int hasThreeAlphaNums (char *str) {
int count = 0;
for (int ch = *str; ch != '\0'; str++)
if (isalnum (ch))
if (++count == 3)
return 1;
return 0;
}
Now, as to whether or not that's faster for this specific case, that depends on many factors, such as whether the language is interpreted or compiled, how efficient the regex is under the covers, and so on.
That's why the mantra of optimisation is "Measure, don't guess!" You should evaluate the possibilities in your target environment.
Upvotes: 2
Reputation: 67968
(?:.*?[a-zA-Z0-9]){3}.*
You can use this.This is much faster and takes much lesser steps than yours.See demo.You probably would want to use ^$
anchors too to make sure there are no partial matches.
https://regex101.com/r/nS2lT4/32
The reason is
(?:.*[a-zA-Z0-9].*){3}
^^
This actually consumes the whole string and then engine has to backtrack.When using the other regex this is avoided
Upvotes: 3