Reputation: 1842
I need to remove all dodgy html characters from a web-site I'm parsing using Curl and simplehtml dom.
<?php
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
Which outputs
string(19) "this is a text"
string(15) "this is a text"
I don't want to use preg* as there are other characters in the text (e.g. °). This is driving me insane now!
Thanks, James
Upvotes: 1
Views: 1414
Reputation: 8529
You need to specify your output encoding with a header:
<?php
header('Content-Type: text/html; charset=utf-8');
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
?>
The browser does not assume UTF-8 by default, that's why it displays the wrong character.
Upvotes: 4
Reputation: 219804
If that's the only character that needs replacing just use str_replace()
var_dump(str_replace(' ', ' ', "this is a text"));
Upvotes: 1