Doon
Doon

Reputation: 3749

PHP HTML encoding

I'm trying to parse a HTML page, but the encoding is messing my results. After some research I found a very popular solution using utf8_encode() and utf8_decode(), but it doesn't change anything. In the following lines, you can check my code and the output.

Code

$str_html = $this->curlHelper->file_get_contents_curl($page);
$str_html = utf8_encode($str_html);

$dom = new DOMDocument();
$dom->resolveExternals = true;
$dom->substituteEntities = false;
@$dom->loadHTML($str_html);
$xpath = new DomXpath($dom);

(...)
$profile = array();
for ($index = 0; $index < $table_lines->length; $index++) {
    $desc = utf8_decode($table_lines->item($index)->firstChild->nodeValue);
}

Output

Testar é bom

Should be

Testar é bom

What I've tried

Some more information

Thanks in advance!

Upvotes: 0

Views: 257

Answers (1)

Cobra_Fast
Cobra_Fast

Reputation: 16061

Try using the following without a prior utf8_decode():

mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');

Alternatively, don't use utf8_decode() and try to change your website meta to:

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

Upvotes: 3

Related Questions