Reputation: 431

How does RSS reader know that a feed is updated?

Just learning about this via youtube but could not find answer to my question of how reader knows there is an update.

Is it like a Push in blackberry?

Upvotes: 43

Answers (6)

Daniel

Reputation: 516

If you look at google rss feed it has a last generated date

e.g

<lastBuildDate>Fri, 27 Sep 2024 11:15:17 GMT</lastBuildDate>

<lastBuildDate>Fri, 27 Sep 2024 11:18:31 GMT</lastBuildDate>

<lastBuildDate>Fri, 27 Sep 2024 11:23:16 GMT</lastBuildDate>

<lastBuildDate>Fri, 27 Sep 2024 11:25:24 GMT</lastBuildDate>

Typically a cron job is set up to query a feed depending on how likely the feed items are being updated or added etc. Now I guess this resource can be cached but from what I can tell the feed is updated every 2/3 minutes (see above).

What is interesting is the items in the feed can vary. One minute an item can be included in the feed, but is not guaranteed to be there the next time the feed is fetched. In other words the feed generation seems to be designed for speed and not consistency. e.g. I have seen one moment an item was there, then it wasn't then it appeared again.

One other challenge with using feeds is that a single resource or url may be accessed in a feed with one or more guid's. e.g. sometimes a resource will be accessed with and without a query string. If google indexes the page with and without a query string, it may generate more than one feed item with a different guid, pointing to the same resource. This can result in the same article being imported more than once. In other words guid is not necessarily unique to an end point or resource.

Upvotes: 0

Emil Vikström

Reputation: 91983

RSS is a pull technology. The reader re-fetches the RSS feed now and then (for example two times per hour, or more often if the reader learns that it's an often updated feed).

The feed is served through regular HTTP and consists of a simple XML file. It is always fetched from the same URL.

Upvotes: 20

Gabriel

Reputation: 18780

RSS is a file format source and doesn't actually know anything about where it gets the entries from. The answer really is: "how can an http request get only the newest results from a server" and the answer is Conditional GET source. Http also supports Conditional PUT.

This is an article about using this feature of http to specifically support rss hackers.

Upvotes: 46

FarWest

Reputation: 461

Let's summarize :

Usually, a client knows that an RSS feed has been updated through polling, that is regular pull (HTTP GET request on the feed URL)
Push doesn't exist on the web, at least, not with HTTP until HTML5 websocket is fixed.
However, some blog frameworks like Wordpress, Google and others, now support the pubsubhubbub convention. In this mode, you would "subscribe" to the updates of an RSS flow. The "hub" will call an URL on YOUR site (callback URL) to send you updates : that is a push.

Push or pull, in both cases you still need to write some piece of code to update the RSS list on your site, database or wherever you store/display it.

And, as a side question, it is not necessary to request the whole XML at every pull to see if the content has changed : using a standard that is not linked to RSS, but global to the whole HTTP protocol (etag and last-modified headers), you can know if the RSS page was modified after a given date, and grab the whole XML only if modified.

Upvotes: 12