user198989
user198989

Reputation: 4665

How to remove all ASCII codes from a string

My sentence include ASCII character codes like

"#$%

How can I remove all ASCII codes?

I tried strip_tags(), html_entity_decode(), and htmlspecialchars(), and they did not work.

Upvotes: 1

Views: 4726

Answers (4)

Jocelyn
Jocelyn

Reputation: 11393

To remove Japanese characters from a string, you may use the following code:

// Decode the text to get correct UTF-8 text:
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');

// Use the UTF-8 properties with `preg_replace` to remove all Japanese characters
$text = preg_replace('/\p{Katakana}|\p{Hiragana}|\p{Han}/u', '', $text);

Documentation:

Unicode character properties
Unicode scripts

Some languages are composed of multiple scripts. There is no Japanese Unicode script. Instead, Unicode offers the Hiragana, Katakana, Han and Latin scripts that Japanese documents are usually composed of.

Try the code here

Upvotes: -1

mcrumley
mcrumley

Reputation: 5700

Are you trying to remove entities that resolve to non-ascii characters? If that is what you want you can use this code:

$str = '" # $ % 琔'; // " # $ % 琔
// decode entities
$str = html_entity_decode($str, ENT_QUOTES, 'UTF-8');
// remove non-ascii characters
$str = preg_replace('/[^\x{0000}-\x{007F}]/u', '', $str);

Or

// decode only iso-8859-1 entities
$str = html_entity_decode($str, ENT_QUOTES, 'iso-8859-1');
// remove any entities that remain
$str = preg_replace('/&#(x[0-9]{4}|\d+);/', '', $str);

If that's not what you want you need to clarify the question.

Upvotes: 2

hakre
hakre

Reputation: 197624

If you have the multibyte string extension at hand, this works:

$string = '"#$%';
mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');

Which does give:

"#$%

Loosely related is:


With the DOM extension you could load it and convert it to a string which probably has the benefit to better deal with HTML elements and such:

echo simplexml_import_dom(@DomDocument::loadHTML('"#$%'))->xpath('//body/p')[0];

Which does output:

"#$%

If it contains HTML, you might need to export the inner html of that element which is explained in some other answer:

Upvotes: 1

Sammaye
Sammaye

Reputation: 43884

You could run this if you don't want the returning values:

preg_replace('/(&#x[0-9]{4};)/', '', $text);

But be warned. This is basically a nuker and with the way HTML entities work I am sure this will interfer with other parts of your string. I would recommend leaving them in personally and encoding them as @hakra shows.

Upvotes: 2

Related Questions