Reputation: 5088
I understand somewhat how to use the XML
package to read and parse an XML file, such as a piece of an RSS feed. However, what is the basic setup for continuously reading an RSS feed?
For example, imagine that I want to set up a facility that continuously reads the feed from http://evemaps.dotlan.net/feed/sovereignty
and stores the data in some kind of R data structure (say, a data.frame
). I imagine that I would need to do something like the following:
data.frame
which grows by each entry addedHowever, this is still a rather vague pictures. What are the basic packages and functions that I would need to string together to make this work? Meaning: what are the basic steps that I would need to put in place to create such a facility? I'm not looking for anyone to write this facility for me (even though that would be nice!). Rather, I'm trying to understand which overall steps are involved.
Upvotes: 0
Views: 924
Reputation: 887
I think you're looking for pubsubhubbub.
With an RSS client (i.e., your R application on AWS) you have 2 choices: polling or PubSubHubbub (aka webhooks, PuSH, and others). As mentioned here, with polling you may be throttled after exceeding some publisher's maximum-pings policy. With PuSH the publisher's server notifies your R application in realtime when there is a new update because it works as a subscription.
The SO answer linked above leads to the blog of popular pay-as-you-go hub provider, Superfeedr, and a post which describes the PuSH protocol's workflow and shows a command line implementation.
You can hear more about the protocol from this Google IO 2010 presentation by one of the engineers who crafted PuSH.
Upvotes: 1