Amr Badawy
Amr Badawy

Reputation: 7673

How to make a web service that consume about 500 RSS and save new items in Database?

I have a project that I need to make a service that we will add to it about 500 RSS for different sites and we want this service to collect new RSS feeds from these sources and save Title and URL in my SQL Server database.

How can I determine the best architecture design, and what codes would help me in that?

Upvotes: 2

Views: 717

Answers (2)

Fredrik Mörk
Fredrik Mörk

Reputation: 158369

I am not going to go into details about implementation or detailed architecture here (mostly from lack of time at this particular moment), but I will say this:

  • It's not the web service that should consume the RSS feeds, it should merely be responsible of spawning the work to do so asynchronously.
  • You should not use threads from the ThreadPool to do this, for two reasons. One is that the work can be assumed to be more or less time consuming (ThreadPool is recommended primarily for short-running tasks), and, perhaps more important, ThreadPool threads are used to serve incoming web requests; don't want to compete with that.

Upvotes: 0

Julien Genestoux
Julien Genestoux

Reputation: 33012

These indications are not specific to your stack (c#, asp.net), but I would definitely not recommend doing anything from the request-response cycle of your web app. It must be done in an asynchronous fashion, but results can be served from the database that you populate with the feed entries.

  1. It's likely that you'll have to build an architecture where you poll each feed every X minutes. Whether it's using a cron job, or a daemon that runs continuously, you'll have to poll each feed one after other other (or with some kind of concurrency, but the design is the same). Please make use of the HTTP headers likes Etags and If-Modified to avoid polling data that hasn't been updated.

  2. Then, you will need to parse the feeds themselves. It's very likely that you'll have to support different flavors of RSS and Atom, but most parsers actually support both.1.

  3. Finally, you'll have to store the entries and, more importantly before you insert them, make sure you haven't already added them. You should use the the id or guid for the entries, but it's likely that you'll have to use your own system too (links, hash...) because many feeds do not have these.

If you want to reduce the amount of polling that you'll have to do, while still keeping timely results, you'll have to implement PubSubHubbub for the feeds which support it.

If you don't want to deal with any of the numerous issues exposed earlier (polling in a timely maner, parsing content, diffing to keep uniqueness of entries...), I would recommand using Superfeedr as it deals with all the pain points.

Upvotes: 5

Related Questions