Reputation: 503
I'm trying to parse an RSS feed and I am getting what appears to be an empty DOM Document object. My current code is:
$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
$xml = curl_exec( $curl );
curl_close( $curl );
//$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
//$xml = utf8_encode($xml);
$document = new DOMDocument;
$document->loadXML( $xml );
if( ini_get('allow_url_fopen') ) {
echo "allow url fopen? Yes";
}
echo "<br />";
var_dump($document);
$items = $document->getElementsByTagName("item");
foreach ($items as $item) {
$title = $item->getElementsByTagName('title');
echo $title;
}
$url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
$xml = simplexml_load_file($url);
foreach ($items as $item) {
$title = $item->title;
echo $title;
}
print_r($xml);
echo "<br />";
var_dump($xml);
echo "<br />hello?";
This code is two separate attempts at parsing the same url based on answers and suggestions given in the following examples found on stack overflow:
Example 1
Example 2
Things I have tried or looked up:
1. Checked to make sure that allow_url_fopen
is allowed
2. Made sure that there is UTF encoding
3. Validated the XML
4. Code examples provided on previously linked Stack Overflow posts
Here is my current output with the var_dumps
and echo's
allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)"
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true)
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0"
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false)
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false)
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" }
bool(false)
hello?
Upvotes: 0
Views: 568
Reputation: 19528
The only issue I had with your code was that not defining a user-agent would give me error 403 to access the feed.
In the future, you could use curl_getinfo
to extract the status code of the request to ensure it didn't failed and further match it against code 200, which means OK.
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
Aside from that a few mistakes within your loops.
With SimpleXML:
<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpcode !== 200)
{
echo "Failed to retrieve feed... Error code: $httpcode";
die();
}
$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
echo $item->title, "<br>\n";
}
With DOMDocument:
<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpcode !== 200)
{
echo "Failed to retrieve feed... Error code: $httpcode";
die();
}
$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
foreach ($item->getElementsByTagName("title") as $title)
{
echo $title->nodeValue, "<br>\n";
}
}
If you just want to print the title/description of all items:
foreach ($feed->channel->item as $item)
{
echo $item->title;
echo $item->description;
// uncomment the below line to print only the first entry.
// break;
}
If you want just the first entry, without using a foreach:
echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;
Saving title and description to an array for later using it:
$result = [];
foreach ($feed->channel->item as $item)
{
$result[] =
[
'title' => (string)$item->title,
'description' => (string)$item->description
];
// could make a key => value alternatively from the above with
// title as key like this:
// $result[(string)$item->title] = (string)$item->description;
}
Foreach with MySQLi/PDO prepared statement:
foreach ($feed->channel->item as $item)
{
// MySQLi
$stmt->bind_param('ss', $item->title, $item->description);
$stmt->execute();
// PDO
//$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
//$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
//$stmt->execute();
}
Upvotes: 1
Reputation: 503
I selected Prix's answer for pointing out the user agent definition, but I came up with another way of doing the loop that avoids nested loops and makes it easier to access other nodes. Here is what I am using (DOM Document solution):
$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
$xml = curl_exec( $curl );
curl_close( $curl );
$document = new DOMDocument;
$document->loadXML( $xml );
$items = $document->getElementsByTagName("item");
foreach ($items as $item) {
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
echo $title;
$desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
echo $desc;
}
Upvotes: 1