sjbennett85
sjbennett85

Reputation: 45

My regex that should only accept latin-based characters is acting strangely

I've got a regex written to the best of my ability that allows the latin character set only with the option of a '-' that, if included MUST be followed by at least one other latin character.

My RegEx:

[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+(?:[-]?[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+)

I came to this after reading a few posts and rereading the manual to figure out the best way to approach this. This check is attached to a text field where a user types only their first name and then submits.

It works okay but there is certainly room for improvement.

Examples:

Tom         // passes  
Éve         // passes  
John-Paul   // passes  
2pac        // passes and removes numbers (not really what I want)  
John316     // passes and removes numbers (not really what I want)  

What I would REALLY want to happen is a fail on those last two checks.
How would I revise it to get the outcome I'd like?

Upvotes: 1

Views: 78

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You need to anchor the regex by adding ^ at the start and $ at the end. That way you will not let any other symbols in the input string.

I also suggest enhancing the pattern by moving ? from after hyphen to the end (that will make regex execution linear as the hyphen has no quantifier and is required, thus, limiting backtracking):

^[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+(?:-[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+)?$

See regex demo.

JS snippet:

console.log(/^[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+(?:-[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+)?$/.test('Éve')); //=> true
console.log(/^[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+(?:-[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+)?$/.test('John-Paul'));  // => true
console.log(/^[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+(?:-[\u00BF-\u1FFF\u2C00-\uD7FFA-Za-z]+)?$/.test('John316'));  // => false

Upvotes: 1

Related Questions