user717452
user717452

Reputation: 111

Removing Item From RSS Error

I use the following PHP to remove items from an XML I own if they are over 8 days old. It had worked fine once before but now gives me the error message

Call to a member function removeChild() on a non-object in /Users//DateTest-3.php on line 40

Line 40 is

$node->parentNode->removeChild($node);

Any ideas why this is throwing the error?

<?php

$rss = new DOMDocument();
$url = 'http://URL.com/Test.xml';
$rss->load($url);
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
    $item = array ( 
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('date')->item(0)->nodeValue,
    );
    array_push($feed, $item);
}

$limit = 50;
for ($i = 0; $i < count($feed); $i++) {
    date_default_timezone_set('America/Los_Angeles');
    $newDate = strtotime("-8 day");
    $date = strtotime($feed[$i]['date']);
    if ($date > $newDate) {
        echo "Don't delete";
    } else {
        echo "Delete";
        $node->parentNode->removeChild($node);
    }
}

$rss->save("Test.xml")




?>

Upvotes: 1

Views: 393

Answers (3)

Jens A. Koch
Jens A. Koch

Reputation: 41737

  • In RSS 1.0 there is no 'date' on items. But 'dc:date' comes into play. http://web.resource.org/rss/1.0/spec#s5.5

  • In RSS 2.0 there is no 'date', but 'pubdate' on items. http://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt

  • Decide, if you want to look for 'date', 'dc:date' and 'pubDate'. The following code works with pubDate.

  • $limit = 50; was unused

  • Removing nodes from a nodeList under iteration will not work. It's an old hat! See comments here: http://php.net/manual/de/domnode.removechild.php The solution is to use a queue for marking the bad nodes and remove them afterwards.

  • I have taken the liberty to mangle the code a bit. I left the debug stuff intentionally active. Mainly for date comparison stuff and reduced list display. The code is commented.

  • Please adjust the feed URL and the "-x days" in the condition. I had to work with a public rss feed to test things.

--

<?php

date_default_timezone_set('America/Los_Angeles');

$feed = array(); // target array for filtered items

$nodesToRemoveQueue = array(); // stores all nodes to remove

$rss = new DOMDocument();
$url = 'http://rss.nytimes.com/services/xml/rss/nyt/Space.xml';
$rss->load($url);

$nodeList = $rss->getElementsByTagName('item');

foreach ($nodeList as $node)
{
    $pubDate = $node->getElementsByTagName('pubDate')->item(0)->nodeValue;

    // if date in the xml feed is older then desired number of days, remove node
    // and proceed with iteration. (do not transfer the data into the $feeds array.)
    if(isDateOlderThenDays($pubDate, '-5 days')) {
        echo 'Removed ' . $pubDate . '<br>';
        // $node->parentNode->removeChild($node); this won't work!!
        $nodesToRemoveQueue[] = $node; // put node in queue, remove later
        continue;
    }

    echo 'Kept ' . $pubDate . '<br>';

    // build item for $feed array, then add item to $feed array
    $item = array (
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $pubDate,
    );

    $feed[] = $item;
}

// helper to compare dates -
function isDateOlderThenDays($date, $days)
{
    // when pubdate($date) is lower(older) then $days, return true, else false.
    return (strtotime($date) < strtotime($days)) ? true : false;
}

// feed array contains all the not "outdated" items
var_dump($feed);

// finally: remove the "outdated" nodes
foreach($nodesToRemoveQueue as $node){
  $node->parentNode->removeChild($node);
}

// nodelist reduction check. this should only displays the dates kept
$nodeList = $rss->getElementsByTagName('item');
foreach ($nodeList as $node) {
    echo $node->getElementsByTagName('pubDate')->item(0)->nodeValue . '<br>';
}

// write reduced RSS XML to file
$rss->save(__DIR__.'/Test.xml');

Another way of saving the XML is:

$xmlString = $rss->saveXML();
file_put_contents(__DIR__.'/Test.xml', $xmlString);

Upvotes: 1

schemar
schemar

Reputation: 622

In your second foreach, reassign $node on every iteration. E.g. $node = $feed[$i].

Upvotes: 0

schemar
schemar

Reputation: 622

Is it on purpose that you only work on the last node after the

foreach ($rss->getElementsByTagName('item') as $node)

Because $node is kept with the last $rss->getElementsByTagName('item') assignment. Or is code missing?

Upvotes: 0

Related Questions