Suraj h k
Suraj h k

Reputation: 173

Internal engine error in regex

I am trying to write a regex for the below use-case:

1. The name must begin with a letter (alphabet).
2. Should only contain alphanumeric characters.
3. No Special characters are allowed except underscore. 
4. Should not have two or more consecutive underscores.
5. Cannot end with an underscore.

My regex for this is ^[A-Za-z]+(?!.*[_]{2,})([a-zA-Z0-9_ ][a-zA-Z0-9]+)*$. This regex gives expected output for the below inputs:

Customer_info
customer info_1
customer
customer___info
cust_info_1_f
ksldfhlksjdhfjskdhfsjdklfhslkdhfsdklfhsdkhfsdklhfdskhfklsdhfkdlshfklsdfhsdklhfsdklfh

But for some special cases which includes the below inputs,

zxbnczmxncbzxnmcbzmxncbzxnxbczxmnbcvmznbxvcbzxnmcmzxvczx,zxc
zjkhsadhskjdhakjsdhaksjhdjkashdaskjdhaskdhaskjdhaskjdhasjkdhaskjdkajshd/ksajkhdashdjkad.asdjk,sa

It fails with a message saying internal engine error. What is reason behind this error?I did some research here, but didn't get much of help in my scenario. Please tell me what is wrong with my regex that is causing this error for some cases.

Note: I was using this tool for testing my regex.

Upvotes: 2

Views: 505

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627371

Your regex fails due to catastrophic backtracking due to the ([a-zA-Z0-9_ ][a-zA-Z0-9]+)* subpattern where the first character class may match the same chars as the second one, and the group is *-quantified.

Also, placing the negative lookahead after a + quantified [A-Za-z] pattern allows a lot more matching paths than you would like to let it.

You may fix your expression with

^[A-Za-z][a-zA-Z0-9]*([_ ][a-zA-Z0-9]+)*$

Details:

  • ^ - start of string
  • [a-zA-Z] - the first symbol must be a letter
  • [a-zA-Z0-9]* - there may appear 0+ letters/digits
  • ([_ ][a-zA-Z0-9]+)* - zero or more sequences of
    • [_ ] - a _ or space
    • [a-zA-Z0-9]+ - 1 or more letters or digits
  • $ - end of the string.

See the regex demo

Upvotes: 1

Related Questions