J_z
J_z

Reputation: 1083

HTML5 Input Pattern vs. Non-Latin Letters

I want to make pre-validation of some input form with new HTML5 pattern attirbute. My dataset is "Domain Name", so <input type="url"> regex preset isn't applied.

But there is a problem, I wont use A-Za-z , because of damned IDN's (Internationalized domain name).

So question: is there any way to use <input pattern=""> for random non-english letters validation ?

I tried \w ofcource but it works only for latin...

Maybe someone has a set of some \xNN-\xNN which guarantees entering of ALL unicode alpha characters, or some another way?

edit: "This question may already have an answer here:" - no, there is no answer.

Upvotes: 7

Views: 5129

Answers (2)

skovacs1
skovacs1

Reputation: 471

Based on my testing, HTML5 pattern attributes supports Unicode character code points in the exact same way that JavaScript does and does not:

  • It only supports \u notation for unicode code points so \u00a1 will match '¡'.
  • Because these define characters, you can use them in character ranges like [\u00a1-\uffff]
  • . will match Unicode characters as well.

You don't really specify how you want to pre-validate so I can't really help you more than that, but by looking up the unicode character values, you should be able to work out what you need in your regex.

Keep in mind that the pattern regex execution is rather dumb overall and isn't universally supported. I recommend progressive enhancement with some javascript on top of the pattern value (you can even re-use the regex more or less).

As always, never trust user input - It doesn't take a genius to make a request to your form endpoint and pass more or less whatever data they like. Your server-side validation should necessarily be more explicit. Your client-side validation can be more generous, depending upon whether false positives or false negatives are more problematic to your use case.

Upvotes: 3

David
David

Reputation: 1245

I know this isn't what you want to hear, but...

The HTML5 pattern attribute isn't really for the programmer so much as it's for the user. So, considering the unfortunate limitations of pattern, you are best off providing a "loose" pattern--one that doesn't give false negatives but allows for a few false positives. When I've run into this problem, I found that the best thing to do was a pattern consisting of a blacklist + a couple minimum requirements. Hopefully, that can be done in your case.

Upvotes: 0

Related Questions