PHP HTML encoding

Question

I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using utf8_encode() and utf8_decode(), but it doesn't change anything. In the following lines, you can check my code and the output.

Code

$str_html = $this->curlHelper->file_get_contents_curl($page);
$str_html = utf8_encode($str_html);

$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);
$xpath = new DomXpath($dom);

(...)
$profile = array();
for ($index = 0; $index < $table_lines->length; $index++) {
    $desc = utf8_decode($table_lines->item($index)->firstChild->nodeValue);
}

Output

Testar Ã© bom

Should be

Testar é bom

What I've tried

htmlentities():

htmlentities($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, ini_get('ISO-8859-1'), false);
htmlspecialchars():

htmlspecialchars($table_lines->item($index)->lastChild->nodeValue, ENT_NOQUOTES, 'ISO- 8859-1', false);
Change my file's charset as decribed here.

Some more information

Website encoding:

Thanks in advance!

Cobra_Fast · Accepted Answer

Try using the following without a prior utf8_decode():

mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');

Alternatively, don't use utf8_decode() and try to change your website meta to:

mb_convert_encoding()

PHP HTML encoding

Code

Output

What I've tried

Some more information

Answers (1)

Related Questions