Ian McIntyre Silber
Ian McIntyre Silber

Reputation: 5663

Regex to reject non-english characters?

Is there a simple regex that will catch all non-english characters? It would need to allow common punctation and symbols, but no special characters such as Russian, Japanese, etc.

Looking for something to work in PHP.

Upvotes: 2

Views: 3224

Answers (5)

abstream
abstream

Reputation: 1

if (strlen($str) == strlen(utf8_decode($str))) {

}

Upvotes: -1

Walf
Walf

Reputation: 9318

use hex codes, e.g. this cleans out all non-ascii characters as well as line endings, and replaces them with spaces. space (\x20) is deliberately left out of the range so that consecutive runs of spaces and/or special chars are replaced with a single space.

$clean = trim(preg_replace('/[^\x21-\x7E]+/', ' ', $input));

Upvotes: 0

Linus Kleen
Linus Kleen

Reputation: 34632

Since in your comment your referring to addresses, they might contain digits too. So:

preg_replace('/[^[:alpha:][:punct:][:digit:]]/u', utf8_encode($input), '');

Should replace your unwanted characters. The [:alpha:] class will only work, if your locale is set up correctly, though. If, for example, it's set to de_DE, not only "a" through "z" are regarded characters, but also "exotics" like "ä", "ö", "è", and the like.

Also, since you don't want "Russian, Japanese, etc.", note the u modifier. The input has to be UTF-8 encoded in order to not break it and give you wrong results.

Upvotes: 2

Ian McIntyre Silber
Ian McIntyre Silber

Reputation: 5663

This q/a seemed to handle it: PHP Validate string characters are UK or US Keyboard characters

Upvotes: 0

Tadas Šubonis
Tadas Šubonis

Reputation: 1600

Such as this one [^A-Za-z0-9\,\.\-]?

Upvotes: 0

Related Questions