Reputation: 416
Im using PHP html_simple_dom.
The targeted site is using UTF-8. My php as well as the stream context are set to use UTF 8.
An element (which i inspect by browser) has an innerHTML of "AAA ' BBB"
, at least as far as when its rendering using my firefox and chrome browsers.
However, my PHP script always fetches this string as "AAA ' BBB"
.
I can fix this using htmlspecialchars_decode($string, 1), but i really want to know why the PHP script, or rather the website is ("wrongly) encoding the string in the first place when visiting it using my PHP, which is explicitly set to UTF
header('Content-Type: text/html; charset=utf-8');
define("CONTEXT", stream_context_create(
array(
"http" =>
array(
"header" => 'Content-Type: text/html; charset=utf-8'
// also tried 'header' => 'Accept-Charset: UTF-8'
)
)
)
);
targetsite reads UTF-8 - http://mtggoldfish.com.cutercounter.com/
$html = file_get_html($url, false, CONTEXT);
// do things, blurts out every "'" as encoded '
Upvotes: 0
Views: 42
Reputation: 2438
Browser inspectors do a bit of transformation to have something human-readable.
Create a simple HTML with only AAA ' BBB
in the body, you will see AAA ' BBB
in the inspectors.
If you really want to see the content of the page, look at the source code (which is what file_get_html
gets)
Upvotes: 1