Munib
Munib

Reputation: 3701

PHP DOMDocument is not rendering Unicode Characters Properly

I am using CKEditor for letting the user to post comments, user can also put the unicode characters in the comment box.

When I submit the Form and Check the $_POST["reply"], the unicode characters are shown very well. I have also used header('Content-type:text/html; charset=utf-8'); at the top of the page But When I process it using PHP DOMDocument, all the characters become unreadable.

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_data );

$elements = $dom->getElementsByTagName('body');

When I echo

echo $dom->textContent;

The Output becomes

§Ø³ÙبÙÙ ÙÙÚº غرÙب ک٠آÙÛ ÙÛÙ

How Can I get the proper unicode characters back using PHP DOMDocument.

Upvotes: 12

Views: 3206

Answers (4)

Andre
Andre

Reputation: 2467

This worked for me:

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';

$dom = new DOMDocument();
$html_data  = mb_convert_encoding($html_data , 'HTML-ENTITIES', 'UTF-8'); // require mb_string
$dom->loadHTML($html_data);

$elements = $dom->getElementsByTagName('body');

Upvotes: 23

Ashraf
Ashraf

Reputation: 747

this worked for arabic langauge

<?php
echo "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1256\"></head><body>";
$html = file_get_contents("    url    ");
$dom = new DOMDocument();
@$dom->loadHTML($html);
$ExTEXT = $dom->getElementById('tag id');
echo utf8_decode($ExTEXT->textContent);
echo "</body></html>";

Upvotes: 1

Munib
Munib

Reputation: 3701

Thank God I got the Solution By Just Replacing

$html_data = '<body>'.$html_unicode . '</body>';

with

$html_data = '<head><meta http-equiv="Content-Type" 
content="text/html; charset=utf-8">
</head><body>' . $html_unicode . '</body>';

Upvotes: 2

Rohit Subedi
Rohit Subedi

Reputation: 550

Try this :)

<?php
    $html_unicode = "xyz unicode data";
    $html_data = '<body>'.$html_unicode . '</body>';
    $dom = new DOMDocument();
    $dom->loadHTML($html_data );

    $elements = $dom->getElementsByTagName('body');
    echo utf8_decode($dom->textContent);
?>

Upvotes: 7

Related Questions