Reputation: 85
I'm opening this thread that is really similar to another one but I cannot figure out an issue : I have an input field that allow a alphanumeric string with an optional unique space as a separator, then an optional other alphanumeric string etc.... I find this regex :
^([0-9a-zA-z]+ ?)*$
It works ! But the performance is really bad as soon as I have 2 consecutives spaces in a long sentence and those 2 spaces are located far in the sentence. In the example below, the result is ok in a half of second if I put the 2 spaces at the beginning of the sentence. But it lasts 10 seconds or more if located far.
dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zo999 ddzdfff ttttt zoddzdfff ttttt zoddzdff
The 2 spaces are after the 999
.
Do you have any idea or suggestion to improve this regex ?
Thanks and regards
PF
ps: you can check the issue as soon as you enter an invalid character far in the string, not specifically 2 spaces.
EDIT : another example : 12345678901234567890' ==> 20 char. + 1 invalid char. => result is immediate Add 5 valid char. and it lasts 5 seconds to perform the regex ! 1234567890123456789012345'
Upvotes: 1
Views: 1408
Reputation: 15010
I suggest changing the expression to something like this:
(?i)^[0-9a-z]+(?:\s[0-9a-z]+)*$
This is functionally similar in that it'll match all alphanumeric characters which are delimited by a single space. A major difference is that I moved the initial word check to the front of the expression, then made a non capture group (?:
...)
for the remaining space delimited words.
Non capture groups (?:
...)
are faster then capture groups (
...)
because the regex engine doesn't need to retain matched values. And by moving the space \s
to the front of the word group on repeat words the engine doesn't need to validate the first character in the group is included in the character class.
You also have a typo in your character class [0-9a-zA-z]
the last z
should probably be upper case. This A-z
format will likely have some odd unexpected results. In my expression I simply added a (?i)
to the beginning to force the regex engine to shift into case insensitive mode, and I dropped the character class to [0-9a-z]
.
In my testing I see that your expression ^([0-9a-z]+ ?)*$
takes about 0.03 seconds to process your sample text with 2 extra spaces toward the end. My recommended expression completes the same test in about 0.000022 seconds. WOW that's an amazing delta.
Upvotes: 1
Reputation: 43419
This is a simpler regex using \w
(word class):
^([\w]+(\s*))$
It's instantaneous in JavaSript
var input = "dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zo999 ddzdfff ttttt zoddzdfff ttttt zoddzdff";
var re = /([\w]+(\s*))/g;
console.log(input.replace(re, "boo"));
Upvotes: 0