Reputation: 33
I am having trouble decoding entities in the title from this youtube video:
http://www.youtube.com/watch?v=p7NMsywVQhY
Here is my code:
$url = 'http://www.youtube.com/watch?v=p7NMsywVQhY';
$html = @file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
//decode the '‪' in the title
$title = html_entity_decode($title,ENT_QUOTES,'UTF-8'); //does not seem to have any effect
//decode the utf data
$title = utf8_decode($title);
$title returns everything fine except returns question marks where ‪
is originally in the title.
Thanks.
Upvotes: 2
Views: 1575
Reputation: 340
Try this to force correct detection of the charset:
$doc = new DOMDocument();
@$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
echo $title;
Upvotes: 0
Reputation: 18721
I don't know if PHP provides any function to do that, however you can use preg_replace
like this:
$string = preg_replace('/&#x([0-9a-f]+);/ei', 'chr(hexdec("$1"))', $string);
Upvotes: 1