Suriyan Suresh
Suriyan Suresh

Reputation: 3024

Parsing RSS Feeds with PHP

I need to aggregate RSS content from roughly 500 URL's, while I'm trying to get content from those URL's time out/memory exhausted error occurred(I am trying with SimplePie library).

Is there any method/idea to pull out content fast from bulk sources?

How do I get fresh contents every time?

<?php
require_once('include/simplepie.inc');    
$urlList = array('http://site1.com/index.rss',
'http://site1.com/index.rss',
'http://site2.com/index.rss',
'http://site3.com/index.rss',
'http://site500.com/index.rss',
);  
$feed = new SimplePie();  
$feed->set_feed_url($urlList);  
$feed->init();  
$feed->handle_content_type();  
?>

html portion

<?php  
foreach($feed->get_items() as $item):  
?>  
<div class="item">
<h2><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></h2>
<p><?php echo $item->get_description(); ?></p>
<p><small>Posted on <?php echo $item->get_date('j F Y | g:i a'); ?></small></p>
</div>
<?php endforeach; ?>

Upvotes: 1

Views: 591

Answers (4)

woodscreative
woodscreative

Reputation: 1031

My experience with SimplePie is that it isn't very good or robust. Try using simplexml_import_dom() instead.

Upvotes: 1

symcbean
symcbean

Reputation: 48357

Is there any method/idea to pull out content fast from bulk sources?

trying to poll all 500 urls synchronously is going to put a lot of stress on the system. This can be mitigated by running the transfers in parallel (using the curl_multi_* functions - but the version of SimplePie I've got here doesn't use these for multiple transfers). Assuming the volume of requests for the composite feeds merits it, then the best solution would be to run a scheduler to download the feeds to your server when the current content is set to expire (applying a sensible minimum value) then compositing the feed from the stored data. Note that if you take this approach you'll need to implement some clever semaphores or use a DBMS to store the data - PHP's file locking sematics are not very sophisticated.

Upvotes: 1

Julien Genestoux
Julien Genestoux

Reputation: 32982

I think you're doing it wrong. If you want to parse that many feeds, you cannot do it from a script that will be called via a webserver.

If you really want to do the polling, you will have to run that script thru say, cron and then 'save' the results to be served by another PHP script (which can be called by the HTTP server).

However, you will still have to deal with a lot of inherent limitation to polling : 99% of the time, you will have no new content, thus wasting your CPU, bandwidth and the ones of the servers you're polling. You will also have to deal with dead feeds, non-valid ones, rate limiting, etc...

Implement the PubSubHubbub protocol. It will help for the feeds who have implemented it, so that you just have to wait for the data that will be pushed to you.

For the other feeds, you can either do the polling yourself, like you did and try to find a way to avoid the individual errors (not valid XML, dead hosts... etc) or really on a service like Superfeedr (I created it).

Upvotes: 2

Martin Wickman
Martin Wickman

Reputation: 19905

Increase your memory_limit = xxM in php.ini or use ini_set("memory_limit","xxM") where xx is the new memory limit.

Upvotes: 0

Related Questions