findAffs
findAffs

Reputation: 103

PHP Regex for Accented Characters

I try to filter a variable allowing alphanumeric ,spaces ,accented characters , and single quotes and replace the reste by a space , so a string like :

substitué à une otage % ? vendredi 23 mars lors de l’attaque

should output :

substitué à une otage vendredi 23 mars lors de l’attaque

but I get as Result the output :

substitué à une otage vendredi 23 mars lors de l

could please help , this is my code

$whitelist = "/[^a-zA-Z0-9а-àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý',. ]/";

$descreption =  preg_replace($whitelist, ' ', $ds);
}else{
    $errors = self::DESCREPTION_ERROR;
    return false;
}

Upvotes: 0

Views: 1797

Answers (3)

Gras Double
Gras Double

Reputation: 16373

You may have a look at Unicode character properties.

Summary of my changes:

  • use \p{L} to match all letters
  • escape the hyphen (\-)
  • support typewriter (') and typographic () apostrophes

Here is the result:

$whitelist = '/[^\p{L}0-9\-\'’,. ]/u';

There is probably room for even further improvement. Finally, don't forget to add the u modifier!

Upvotes: 1

Nick
Nick

Reputation: 147146

One way to deal with the range of accented characters is to use the POSIX [:alnum:] class, which in PHP in conjunction with the u modifier will match all of them. That can then be put into a negated character class with the other characters you want to keep to allow the other characters to be removed:

$string = 'substitué à une otage % ? vendredi 23 mars lors de l’attaque';
echo preg_replace("/[^[:alnum:]'’,.]/u", ' ', $string);

Output:

substitué à une otage vendredi 23 mars lors de l’attaque

As has been pointed out in the comments, is not the same as ' and so it also needs to be added to the set of characters you want to keep.

Demo on 3v4l.org

Upvotes: 1

maio290
maio290

Reputation: 6732

Your regex is faulty. The part а-à gives the error Character range is out of order - I guess the - was added by mistake there...

Then a small hint: is not '

[^a-zA-Z0-9àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý'’,. ] 

should work fine.

Also, if you're working with Regex, tools like RegExr or regex101 are really a nice thing.

Upvotes: 3

Related Questions