Aram Kocharyan
Aram Kocharyan

Reputation: 20431

UTF-8 and ISO 8859-1 encoding in PHP

I am trying to convert all   instances to regular spaces in PHP:

echo '<meta charset="UTF-8" /> ';
echo html_entity_decode('&nbsp;');
echo html_entity_decode('&nbsp;', ENT_COMPAT, 'UTF-8');

If the first line is commented out, then the output will be in ISO 8859-1 and read:

 Â

Where there is a space in front. If UTF-8 encoding is specified, it reads:

Which is an undefined UTF-8 character followed by a space. Is there anyway to ensure that all HTML entity spaces are correctly decoded regardless of the encoding?

The space character is really just an example, what I am trying to do is read html input from an unspecified charset and display it. So < and &#60; would both become <.

Upvotes: 2

Views: 4390

Answers (3)

Michas
Michas

Reputation: 9468

This is problem with encodings. They are not compatible. You have to use different options in html_entity_decode for every encoding. However, You may convert input to utf-8 (iconv) first and use html_entity_decode($string, ENT_COMPAT, 'UTF-8') later.

If You don't know the encoding of input, You have to guess.

Upvotes: 1

Chris
Chris

Reputation: 668

Why not send a header first?

header("Content-type: text/html; charset=utf-8");
echo html_entity_decode("&nbsp;", ENT_COMPAT, 'UTF-8');

Upvotes: 0

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125454

&nbsp; is not a space. It is the byte 160 in ISO 8859-1 and in UTF-8 it is \xc2\xa0. As the name no breakable space implies the browser will not replace it for a line break.

If you want a space you will have to replace it with a space.

Upvotes: 4

Related Questions