Reputation: 4665
My sentence include ASCII character codes like
"#$%
How can I remove all ASCII codes?
I tried strip_tags()
, html_entity_decode()
, and htmlspecialchars()
, and they did not work.
Upvotes: 1
Views: 4726
Reputation: 11393
To remove Japanese characters from a string, you may use the following code:
// Decode the text to get correct UTF-8 text:
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// Use the UTF-8 properties with `preg_replace` to remove all Japanese characters
$text = preg_replace('/\p{Katakana}|\p{Hiragana}|\p{Han}/u', '', $text);
Documentation:
Unicode character properties
Unicode scripts
Some languages are composed of multiple scripts. There is no Japanese Unicode script. Instead, Unicode offers the Hiragana, Katakana, Han and Latin scripts that Japanese documents are usually composed of.
Upvotes: -1
Reputation: 5700
Are you trying to remove entities that resolve to non-ascii characters? If that is what you want you can use this code:
$str = '" # $ % 琔'; // " # $ % 琔
// decode entities
$str = html_entity_decode($str, ENT_QUOTES, 'UTF-8');
// remove non-ascii characters
$str = preg_replace('/[^\x{0000}-\x{007F}]/u', '', $str);
Or
// decode only iso-8859-1 entities
$str = html_entity_decode($str, ENT_QUOTES, 'iso-8859-1');
// remove any entities that remain
$str = preg_replace('/&#(x[0-9]{4}|\d+);/', '', $str);
If that's not what you want you need to clarify the question.
Upvotes: 2
Reputation: 197624
If you have the multibyte string extension at hand, this works:
$string = '"#$%';
mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
Which does give:
"#$%
Loosely related is:
With the DOM extension you could load it and convert it to a string which probably has the benefit to better deal with HTML elements and such:
echo simplexml_import_dom(@DomDocument::loadHTML('"#$%'))->xpath('//body/p')[0];
Which does output:
"#$%
If it contains HTML, you might need to export the inner html of that element which is explained in some other answer:
Upvotes: 1
Reputation: 43884
You could run this if you don't want the returning values:
preg_replace('/(&#x[0-9]{4};)/', '', $text);
But be warned. This is basically a nuker and with the way HTML entities work I am sure this will interfer with other parts of your string. I would recommend leaving them in personally and encoding them as @hakra shows.
Upvotes: 2