Reputation: 71
I want only the unencoded characters to get converted to html entities, without affecting the entities which are already present. I have a string that has previously encoded entities, e.g.:
gaIUSHIUGhj>‐ hjb×jkn.jhuh>hh> …
When I use htmlentities()
, the &
at the beginning of entities gets encoded again. This means ‐
and other entities have their &
encoded to &
:
×
I tried decoding the complete string, then encoding it again, but it does not seem to work properly. This is the code I tried:
header('Content-Type: text/html; charset=iso-8859-1');
...
$b = 'gaIUSHIUGhj>‐ hjb×jkn.jhuh>hh> …';
$b = html_entity_decode($b, ENT_QUOTES, 'UTF-8');
$b = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $b);
$b = htmlentities($b, ENT_QUOTES, 'UTF-8');
But it does not seem to work the right way. Is there a way to prevent or stop this from happening?
Upvotes: 7
Views: 5644
Reputation: 49198
You did good looking at the documentation, but you missed the best part. It can be hard to decipher this sometimes:
// > > > > > > Scroll >>> > > > > > Keep going. > > > >>>>>> See below. <<<<<<
string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]] )
Look at the very end.
I know. Confusing. I usually ignore the signature line and go straight down to the next block (Parameters
) for the blurbs on each argument.
So you want to use the double_encoded
argument at the end to tell htmlentities
not to re-encode (and you probably want to stick with UTF-8
unless you have a specific reason not to):
$str = "gaIUSHIUGhj>‐ hjb×jkn.jhuh>hh> …";
// Double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', true) . "\n";
// Not double-encoded!
echo htmlentities($str, ENT_COMPAT, 'utf-8', false);
https://ignite.io/code/513ab23bec221e4837000000
Upvotes: 5
Reputation: 324630
Set the optional $double_encode
variable to false
. See the documentation for more information.
Your resulting code should look like:
$b = htmlentities($b, ENT_QUOTES, 'UTF-8', false);
Upvotes: 6