Reputation: 20431
I am trying to convert all instances to regular spaces in PHP:
echo '<meta charset="UTF-8" /> ';
echo html_entity_decode(' ');
echo html_entity_decode(' ', ENT_COMPAT, 'UTF-8');
If the first line is commented out, then the output will be in ISO 8859-1 and read:
Â
Where there is a space in front. If UTF-8 encoding is specified, it reads:
�
Which is an undefined UTF-8 character followed by a space. Is there anyway to ensure that all HTML entity spaces are correctly decoded regardless of the encoding?
The space character is really just an example, what I am trying to do is read html input from an unspecified charset and display it. So < and < would both become <.
Upvotes: 2
Views: 4390
Reputation: 9468
This is problem with encodings. They are not compatible. You have to use different options in html_entity_decode
for every encoding. However, You may convert input to utf-8 (iconv) first and use html_entity_decode($string, ENT_COMPAT, 'UTF-8')
later.
If You don't know the encoding of input, You have to guess.
Upvotes: 1
Reputation: 668
Why not send a header first?
header("Content-type: text/html; charset=utf-8");
echo html_entity_decode(" ", ENT_COMPAT, 'UTF-8');
Upvotes: 0
Reputation: 125454
is not a space. It is the byte 160 in ISO 8859-1 and in UTF-8 it is \xc2\xa0
. As the name no breakable space
implies the browser will not replace it for a line break.
If you want a space you will have to replace it with a space.
Upvotes: 4