Extracting text between html tags with multiple classes with DOM and XPATH

Question

I am trying to extract text between 1 HTML tags but fail to do this:

HTML - Text to be extracted (http://www.alexa.com/siteinfo/google.com)

3,757,209

PHP

$data = frontend::file_get_contents_curl('http://www.alexa.com/siteinfo/'.$domain); // custom function that return the HTML string
$dom = new DOMDocument();
$dom->loadHTML(htmlentities($data));
$xpath = new DOMXpath($dom);
$backlinks = $xpath->query('//span[@class="font-4 box1-r"]/text()');
var_dump($backlinks); // returns null

har07 · Accepted Answer

The actual problem is due to htmlentities() escaping all tag delimiters (< and >), so you end up loading a long string with no elements and attributes to DOMDocument() :

$data = <<3,757,209
HTML;
$doc = new DOMDocument();
$doc->loadHTML(htmlentities($data));
echo $doc->saveXML();

eval.in demo (problem) eval.in demo (solution)

output :



<div><span class="font-4 box1-r">3,757,209</span></div>

Extracting text between html tags with multiple classes with DOM and XPATH

Answers (2)

Related Questions