seb
seb

Reputation: 313

Accent characters in a regex

I am using this regex to be able to accept accent characters

/^([\p{L}a-zA-Z ,-]*)$/i

When i test my regex on this website : http://rubular.com/r/MRESYEGO2d everything is ok, but when i use the same regex in my php its not working.

$alphaNumCity = "/^([\p{L}a-zA-Z0-9 ,-]*)$/i";
if (preg_match($alphaNumCity, $champ)) {
    echo "<label for='tags'>Villes<span style='color:red;'>*</span></label><input id='tags' name='businessVille' value='".$champ."' required />";
} else {
    echo "<label for='tags'>Villes<span style='color:red;'>(entrer un nom de ville valide)*</span></label><input id='tags' name='businessVille' required />";
    $valide = false;
}

This code is going in the else.

I don't understand why its working here http://rubular.com/r/MRESYEGO2d and not in my code ?

Upvotes: 0

Views: 167

Answers (2)

000
000

Reputation: 27227

The unicode flag "u", /^([\p{L}a-zA-Z0-9 ,-]*)$/iu, comes with some notes:

Regarding the validity of a UTF-8 string when using the /u pattern modifier, some things to be aware of;

  1. If the pattern itself contains an invalid UTF-8 character, you get an error (as mentioned in the docs above - "UTF-8 validity of the pattern is checked since PHP 4.3.5"

  2. When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a "quiet death" for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8

  3. PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode ( see section 5.9 "Character Encoding" of the "Secure Programming for Linux and Unix HOWTO" - can be found at http://www.tldp.org/ and other places )

  4. For an example algorithm in PHP which tests the validity of a UTF-8 string (and discards five / six octet sequences) head to: http://hsivonen.iki.fi/php-utf8/

See the documentation for a code sample and further information: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805

Upvotes: 1

Jerry
Jerry

Reputation: 71538

Use the unicode flag (or unicode modifier):

/^([\p{L}a-zA-Z ,-]*)$/iu

Upvotes: 1

Related Questions