Reputation: 880
I'm approaching web programming. I need to retrieve some informations from a web page. I have the url of the page, so I want the html source code, translate it into xml and then use the dom functions of php to fetch the informations I need.
My php code is this:
$url=$_POST['url']; //url
$doc_html=new DOMDocument();
$doc_html->loadHTML($url); //html page
$doc_xml=new DOMDocument();
$doc_xml->loadXML($doc_html->saveXML()); //xml converted page
$nome_app=new DOMElement($doc_xml->getElementById('title'));
echo $nome_app->nodeValue;
I get this fatal error:
Uncaught exception 'DOMException' with message 'Invalid Character Error' on this line:
$nome_app=new DOMElement($doc_xml->getElementById('title'));
What's wrong? Is it the entire process html-to-xml? I found some example on the web and should work... Thanks!
Upvotes: 3
Views: 100
Reputation: 880
Solved! Simply:
$doc_html=new DOMDocument();
$doc_html->loadHTML(file_get_contents($url));
$doc_html->saveXML();
$nome = $doc_html->getElementsByTagName('h1');
foreach ($nome as $n) {
echo $n->nodeValue, PHP_EOL;
}
Maybe the code was too messy before. Thanks everybody for the answers!
Upvotes: 2
Reputation: 37846
best way is to use xpath queries,
http://php.net/manual/en/simplexmlelement.xpath.php
it is very fast
Upvotes: 0
Reputation: 25698
I would go for a preg_match() solution to get the content you need over parsing the whole document as XML. Specially if the document becomes invalid for some reason you won't get your info anymore.
Upvotes: 1
Reputation: 2494
You need to define XML entities for the special characters that you're using in your HTML. It must be the same kind of problem than here: DOMDocument::loadXML vs. HTML Entities
Upvotes: 1