Reputation: 346
I am getting certain text in utf8 character set, now I want to convert it to ASCII and characters that are not supported in ASCII should be replaced with space in PHP. The current code I use is
$input_encoding = mb_detect_encoding($toClean);
mb_substitute_character("long");
$encoded = mb_convert_encoding($toClean, "ASCII", "auto");
Now it shows characters like "testU+2013ng" in output, I want this U+2013 to be replaced with space. I tried using the regilar expression below
$encoded = preg_replace("~U\+[\d\w]{4}~", " ", $encoded);
Now it is showing text like "Road ' +CB9 +CA4 +CAEU+" in output. How do I remove all the non supported characters using preg or something.
Upvotes: 1
Views: 295
Reputation: 10852
I don't see anything particularly wrong with the regex, but you could simplify it down to:
U\+\d{4}
Upvotes: 1