Reputation: 41865
I have the following script that works almost fine except two things:
<note>
, <to>
, or <?xml version="1.0" encoding="ISO-8859-1"?>
//text()[not(self::script)]
but this breaks the xpathScript:
$contents = file_get_contents("http://www.w3schools.com/php/php_xml_dom.asp");
$dom = new DOMDocument();
@$dom->loadHTML($contents);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
// see http://www.w3schools.com/xpath/xpath_syntax.asp
$hrefs = $xpath->evaluate("//text()");
for ($i = 0; $i < $hrefs->length; $i++)
echo $hrefs->item($i)->nodeValue;
Do you have a better solution to extract text from a webpage ?
Note: I could simply use strip_tags, but I want to stick with DOMDocument.
Upvotes: 0
Views: 2169
Reputation: 9034
I've always used this http://simplehtmldom.sourceforge.net/ and every time with success.
Upvotes: 2