Greg
Greg

Reputation: 7922

Javascript regex to reject non ASCII-US characters

^[^\x00-\x1F\x7F-\xFF]+$

This regex will properly fail to match a string that contains non-printing (hex 00-1f) or ASCII extended characters (hex 80-FF), but, unlike PHP, lets non-ASCII utf-8 characters pass. (eg. 日本واستقرارهहिन्दीދިވެހިބަސްગુજરાતી한)

Looking at the wikipedia page on UTF-8 all of those should fall in the 80-ff range. Does anyone know what I'm missing?

Also, if you could explain how to ignore quoted text, you would be my hero forever.

Upvotes: 1

Views: 3225

Answers (1)

Delan Azabani
Delan Azabani

Reputation: 81492

Hmm... instead of rejecting byte ranges, try matching actual Unicode characters, e.g.:

^[\u0020-\u007e]+$

Upvotes: 9

Related Questions