Reputation: 43
I have this strange problem parsing XML document in PHP loaded via cURL. I cannot get nodeValue containing URL address (I'm trying to implement simple RSS reader into my CMS). Strange thing is that it works for every node except that containing url addresses and date ( and ).
Here is the code (I know it is a stupid solution, but I'm kinda newbie in working with DOM and parsing XML documents).
function file_get_contents_curl($url) {
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
$result = curl_exec($ch); // run the whole process
return $result;
}
function vypis($adresa) {
$html = file_get_contents_curl($adresa);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$desc = $doc->getElementsByTagName('description');
$ctg = $doc->getElementsByTagName('category');
$pd = $doc->getElementsByTagName('pubDate');
$ab = $doc->getElementsByTagName('link');
$aut = $doc->getElementsByTagName('author');
for ($i = 1; $i < $desc->length; $i++) {
$dsc = $desc->item($i);
$titles = $nodes->item($i);
$categorys = $ctg->item($i);
$pubDates = $pd->item($i);
$links = $ab->item($i);
$autors = $aut->item($i);
$description = $dsc->nodeValue;
$title = $titles->nodeValue;
$category = $categorys->nodeValue;
$pubDate = $pubDates->nodeValue;
$link = $links->nodeValue;
$autor = $autors->nodeValue;
echo 'Title:' . $title . '<br/>';
echo 'Description:' . $description . '<br/>';
echo 'Category:' . $category . '<br/>';
echo 'Datum ' . gmdate("D, d M Y H:i:s",
strtotime($pubDate)) . " GMT" . '<br/>';
echo "Autor: $autor" . '<br/>';
echo 'Link: ' . $link . '<br/><br/>';
}
}
Can you please help me with this?
Upvotes: 3
Views: 1929
Reputation: 173562
To read RSS you shouldn't use loadHTML
, but loadXML
. One reason why your links don't show is because the <link>
tag in HTML ignores its contents. See also here: http://www.w3.org/TR/html401/struct/links.html#h-12.3
Also, I find it easier to just iterate over the <item>
tags and then iterate over their children nodes. Like so:
$d = new DOMDocument;
// don't show xml warnings
libxml_use_internal_errors(true);
$d->loadXML($xml_contents);
// clear xml warnings buffer
libxml_clear_errors();
$items = array();
// iterate all item tags
foreach ($d->getElementsByTagName('item') as $item) {
$item_attributes = array();
// iterate over children
foreach ($item->childNodes as $child) {
$item_attributes[$child->nodeName] = $child->nodeValue;
}
$items[] = $item_attributes;
}
var_dump($items);
Upvotes: 2