Korpel
Korpel

Reputation: 2458

How to change encoding from plain text to Unicode so that I can read special characters from a HTML?

Below is my code :

<?php
// example of how to use basic selector to retrieve HTML contents
include('/Library/WebServer/Documents/simple_html_dom.php');  //this is the api for the simplehtmldom

// get DOM from URL or file
$html = file_get_html('http:/www.google.hk');



// extract text from table
echo $html->find('td[align="top"]', 1)->innertext.'<br><hr>';

// extract text from HTML
echo $html->innertext;
?>

I am using the simplephphtmldon API. When I execute my php program in my local server instead I get so many unrecognized characters due to the fact that the plain text can't really encode them to show up like they supposed to. Can Someone tell me what i need to change to inner text in order to get all the characters to show up? PS i also did try plaintext without any luck. textContent seems broken to me. Perhaps i need to try a different element first (?). Thanks

Upvotes: 1

Views: 660

Answers (1)

Pedro Lobito
Pedro Lobito

Reputation: 98881

echo utf8_encode($html->innertext);

Or

echo utf8_decode($html->innertext);

It depends on the original encoding, so you may want to try both.


Note: If you're seeing the output on a browser, make sure you set Unicode as text encoding or use this the following code at the top of you script.

header('Content-Type: text/html; charset=utf-8');

Upvotes: 1

Related Questions