esseara
esseara

Reputation: 880

Php - retrieve informations from an xml file

I'm approaching web programming. I need to retrieve some informations from a web page. I have the url of the page, so I want the html source code, translate it into xml and then use the dom functions of php to fetch the informations I need.

My php code is this:

$url=$_POST['url']; //url

$doc_html=new DOMDocument();
$doc_html->loadHTML($url); //html page
$doc_xml=new DOMDocument();
$doc_xml->loadXML($doc_html->saveXML()); //xml converted page

$nome_app=new DOMElement($doc_xml->getElementById('title'));

echo $nome_app->nodeValue;

I get this fatal error:

Uncaught exception 'DOMException' with message 'Invalid Character Error' on this line:

$nome_app=new DOMElement($doc_xml->getElementById('title'));

What's wrong? Is it the entire process html-to-xml? I found some example on the web and should work... Thanks!

Upvotes: 3

Views: 100

Answers (4)

esseara
esseara

Reputation: 880

Solved! Simply:

$doc_html=new DOMDocument();
$doc_html->loadHTML(file_get_contents($url));
$doc_html->saveXML();
$nome = $doc_html->getElementsByTagName('h1');
foreach ($nome as $n) { 
   echo $n->nodeValue, PHP_EOL;
}

Maybe the code was too messy before. Thanks everybody for the answers!

Upvotes: 2

doniyor
doniyor

Reputation: 37846

best way is to use xpath queries,

http://php.net/manual/en/simplexmlelement.xpath.php

it is very fast

Upvotes: 0

floriank
floriank

Reputation: 25698

I would go for a preg_match() solution to get the content you need over parsing the whole document as XML. Specially if the document becomes invalid for some reason you won't get your info anymore.

Upvotes: 1

Bgi
Bgi

Reputation: 2494

You need to define XML entities for the special characters that you're using in your HTML. It must be the same kind of problem than here: DOMDocument::loadXML vs. HTML Entities

Upvotes: 1

Related Questions