Einius
Einius

Reputation: 1382

PHP remove all non UTF-8 characters from string

I need to remove symbols like ",./! and so on from the beginning and the end of the string. but still need to leave numbers and characters like ąčęėįšųž and many more from UTF-8. for example:

  1. the result of string &g&g should be g&g;
  2. the result of string ąčęėį should be ąčęėį;
  3. the result of string "name" should be name;
  4. the result of string 69 should be 69
  5. the result of string --abc--- should be abc

I believe it should be done using preg_replace but can't find how.

Upvotes: 1

Views: 3628

Answers (1)

Toto
Toto

Reputation: 91375

If I understand well, this will do what you want:

$result = preg_replace('/(?:^[^\p{L}\p{N}]+|[^\p{L}\p{N}]+$)/u', '', $input);

Where

\p{L} stands for any character that is a letter (unicode)
\p{N} stands for any character that is a digit (unicode)
[^\p{L}\p{N}] is a negative character class that matches characters that is not letter or digit.

Upvotes: 3

Related Questions