Reputation: 662
how can i to remove all characters non-language ?
i want to remove characters like this below, and all other of not language characters:
i using this:
preg_replace("/[^a-z0-9A-Z\-\'\|\!\.\?\:\)\(\;\*\"]/u", " ", $text );
this is good for english, i need to approve all language characters, like Russian,arabic,hebrew,japan...
Are there any string functions I can use to leave all language characters?
thanks
Upvotes: 1
Views: 4362
Reputation: 2599
Tim Pietzcker's answer not working in my case.
This works.
$after = preg_replace('/[^\w\s]+/u','' , $before);
Upvotes: 1
Reputation: 336098
No regex will be perfect for what you want - language and writing are just too complex for this. But an approximation could be
preg_replace('/[^\p{L}\p{M}\p{Z}\p{N}\p{P}]/u', ' ', $text);
This will replace anything by a space that's not a Unicode character with one of the properties “letter”, “mark”, “separator”, “number” or “punctuation”.
Upvotes: 11