xRobot
xRobot

Reputation: 26565

How to create a Google Reader?

I need to create a web tool like Google Reader for my college project.

I have 2 question about it:

1) How Google Reader track the read and unread posts ?

2) Google Reader save every post in the db or load the feeds at the moment ?

Upvotes: 1

Views: 834

Answers (4)

Ahmad Azimi
Ahmad Azimi

Reputation: 693

You car use Selfoos the new multipurpose rss reader, live stream, mashup, aggregation web application.

Features:

  • web based rss reader
  • universal aggregator
  • open source and free
  • easy extendable with an open plugin system (write your own data connectors)
  • mobile support (Android, iOS, iPad)
  • use selfoss to live stream and collect all your posts, tweets, feeds in one place
  • lightweight PHP application with less than 2 MB
  • supports MySQL, PostgreSQL and Sqlite Databases
  • OPML Import
  • easy installation: upload and run
  • with restful json api

Web site: http://selfoss.aditu.de/

GitHub: https://github.com/SSilence/selfoss

Upvotes: 0

sangupta
sangupta

Reputation: 2406

Not sure if it may help now, but for others who drop by I jotted my thoughts with a detailed design:

Designing a Scalable Google Reader Clone

Upvotes: 2

Piskvor left the building
Piskvor left the building

Reputation: 92792

re #2: Google has a special RSS crawler bot called FeedFetcher. When you request the RSS feed, it's dispatched to retrieve it, and stores the feed into its global (all-user) cache, identified by URL. Next time the feed is requested (even by a different user - as long as the URL matches), it is loaded from the cache.

I'm not sure what the cache invalidation mechanisms are, but the crawler definitely doesn't revisit the feeds strictly as often as the response's Cache-Control mechanisms would indicate (that's probably a good thing, as many generated RSS feeds send no-cache although they don't change too often). This internal cache doesn't seem to persist for longer than a few hours, though.

(these are the hypotheses I formulated some time ago from my RSS feed access logs; I still think they're valid, as I haven't seen any major change in the crawler's behavior since)

Upvotes: 2

Femaref
Femaref

Reputation: 61497

  1. assign a hash to a single feed post (ie. date+url+??? = hash to identify a single post)
  2. loads them on the fly would be my guess, maybe caches a limited number per user.

Upvotes: 3

Related Questions