Reputation: 7673
I have a project that I need to make a service that we will add to it about 500 RSS for different sites and we want this service to collect new RSS feeds from these sources and save Title and URL in my SQL Server database.
How can I determine the best architecture design, and what codes would help me in that?
Upvotes: 2
Views: 717
Reputation: 158369
I am not going to go into details about implementation or detailed architecture here (mostly from lack of time at this particular moment), but I will say this:
ThreadPool
to do this, for two reasons. One is that the work can be assumed to be more or less time consuming (ThreadPool
is recommended primarily for short-running tasks), and, perhaps more important, ThreadPool
threads are used to serve incoming web requests; don't want to compete with that.Upvotes: 0
Reputation: 33012
These indications are not specific to your stack (c#
, asp.net
), but I would definitely not recommend doing anything from the request-response cycle of your web app. It must be done in an asynchronous fashion, but results can be served from the database that you populate with the feed entries.
It's likely that you'll have to
build an architecture where you
poll each feed every X minutes. Whether it's using a cron
job, or
a daemon that runs continuously,
you'll have to poll each feed one
after other other (or with some kind
of concurrency, but the design is
the same). Please make use of the
HTTP headers likes Etags and
If-Modified to avoid polling data
that hasn't been updated.
Then, you will need to parse the feeds themselves. It's very likely that you'll have to support different flavors of RSS and Atom, but most parsers actually support both.1.
Finally, you'll have to store the
entries and, more importantly before
you insert them, make sure you
haven't already added them. You
should use the the id
or guid
for the entries, but it's likely
that you'll have to use your own
system too (links, hash...) because
many feeds do not have these.
If you want to reduce the amount of polling that you'll have to do, while still keeping timely results, you'll have to implement PubSubHubbub for the feeds which support it.
If you don't want to deal with any of the numerous issues exposed earlier (polling in a timely maner, parsing content, diffing to keep uniqueness of entries...), I would recommand using Superfeedr as it deals with all the pain points.
Upvotes: 5