Reputation: 1168
I need a function that will clean a strings' special characters. I do NOT want this to convert HTML characters like
<br />
to
<br />
I want to convert things like: •, ½, ’ to html code.
This is the function I currently use, but it doesn't appear to work with the fractions..
function cleanText($str){
$str = str_replace("Ñ" ,"Ñ", $str);
$str = str_replace("ñ" ,"ñ", $str);
$str = str_replace("ñ" ,"ñ", $str);
$str = str_replace("Á","Á", $str);
$str = str_replace("á","á", $str);
$str = str_replace("É","É", $str);
$str = str_replace("é","é", $str);
$str = str_replace("ú","ú", $str);
$str = str_replace("ù","ù", $str);
$str = str_replace("Í","Í", $str);
$str = str_replace("í","í", $str);
$str = str_replace("Ó","Ó", $str);
$str = str_replace("ó","ó", $str);
$str = str_replace("“","“", $str);
$str = str_replace("”","”", $str);
$str = str_replace("‘","‘", $str);
$str = str_replace("’","’", $str);
$str = str_replace("—","—", $str);
$str = str_replace("–","–", $str);
$str = str_replace("™","™", $str);
$str = str_replace("ü","ü", $str);
$str = str_replace("Ü","Ü", $str);
$str = str_replace("Ê","Ê", $str);
$str = str_replace("ê","î", $str);
$str = str_replace("Ç","Ç", $str);
$str = str_replace("ç","ç", $str);
$str = str_replace("È","È", $str);
$str = str_replace("è","è", $str);
$str = str_replace("•","•" , $str);
$str = str_replace("¼","¼" , $str);
$str = str_replace("½","½" , $str);
$str = str_replace("¾","¾" , $str);
$str = str_replace("½","½" , $str);
return $str;
}
Upvotes: 2
Views: 12515
Reputation: 19552
You can replace your entire function with htmlentities
using the ENT_SUBSTITUTE
attribute. It will perform much faster in addition to working correctly.
Note: ENT_SUBSTITUTE
available as of PHP 5.4.
Upvotes: 4
Reputation: 8528
Try this, I've used this function to convert anything/everything to unicode:
class unicode_replace_entities {
public function UTF8entities($content="") {
$contents = $this->unicode_string_to_array($content);
$swap = "";
$iCount = count($contents);
for ($o=0;$o<$iCount;$o++) {
$contents[$o] = $this->unicode_entity_replace($contents[$o]);
$swap .= $contents[$o];
}
return mb_convert_encoding($swap, "UTF-8"); //not really necessary, but why not.
}
public function unicode_string_to_array( $string ) { //adjwilli
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr( $string, 0, 1, "UTF-8" );
$string = mb_substr( $string, 1, $strlen, "UTF-8" );
$strlen = mb_strlen( $string );
}
return $array;
}
public function unicode_entity_replace($c) { //m. perez
$h = ord($c{0});
if ($h <= 0x7F) {
return $c;
} else if ($h < 0xC2) {
return $c;
}
if ($h <= 0xDF) {
$h = ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xEF) {
$h = ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xF4) {
$h = ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
}
}
}
$oUnicodeReplace = new unicode_replace_entities();
$oUnicodeReplace->UTF8entities($string);
Mind you it will convert everything but it will take care of weird characters otherwise...not my own script but I have no idea where I found it either.
Upvotes: 2
Reputation: 53950
Guess it's time to take a look at the htmlentities
PHP function, and its options.
Basically, you can replace your whole function with:
$str = htmlentities( $str );
It will be also a lot more efficient.
Be sure to take a look at the function's optional parameters, if you need special processing (especially ENT_SUBSTITUTE
).
$str = htmlentities( $str, ENT_SUBSTITUTE );
Upvotes: 3
Reputation: 2387
Yea: http://www.php.net/manual/en/function.htmlentities.php Or this: http://www.php.net/manual/en/function.htmlspecialchars.php
Upvotes: 1