Reputation: 470
I am trying to create my own rss reader app in ruby on rails. I want to be able to store various news stories in my database that I can pull from later to display each story with its headline, image, summary, etc. in a nice layout. I am working with the feedjira library and am also pretty new to RoR. I know that these two commands in the rails console fetch rss feeds and somehow parse them:
urls = %w[http://feedjira.com/blog/feed.xml https://github.com/feedjira/feedjira/feed.xml]
feeds = Feedjira::Feed.fetch_and_parse urls
While these two commands work on rss feeds, I was wondering how I could configure my database/model and then save the news entries I get from Feedjira into the db. I tried watching the railscast on this issue but it seemed a bit out of date. Any help on this issue would be immensely appreciated! Thanks in advance!
Upvotes: 3
Views: 2002
Reputation: 3780
Here's one way:
Create a model such as this:
class Entry < ActiveRecord::Base
attr_accessible :guid, :source_site_id, :url, :title, :summary, :description, :published_at
def self.update_from_feed(feed_name)
feed = Feed.find_by_name(feed_name)
feed_data = Feedjira::Feed.fetch_and_parse(feed.feed_url)
add_entries(feed_data.entries, feed)
end
private
def self.add_entries(entries, feed)
entries.each do |entry|
break if exists? :entry_id => entry.id
create!(
:entry_id => entry.id,
:feed_id => feed.id,
:url => entry.url,
:title => entry.title.sanitize,
:summary => entry.summary.sanitize,
:description => entry.content.sanitize,
:published_at => entry.published
)
end
end
end
end
You can then call this from the cli / cron or whatever with, for example:
rails runner -e development 'Entry.update_from_feed("feedname")'
This runs the update_from_feed method in the context of your Rails app using a separate rails instance (a bit like rails console
), but doesn't impact the running Rails instance.
In this example, there's a separate model which has name and feed_urls, so there's a lookup of the url based on the provided name.
This code doesn't use the ability of Feedjira to check for updates, so dupe checking is baked in. (This guthub issue says to avoid using the #update method.
Note that the use of break
assumes that new entries are always added to the top of the feed. If you don't trust the feed, then replace break if
with unless
. The url can be used as an alternative unique id.
Edit:
Here's a version of the update_from_feed method that takes advantage of Feedjira's ability to process multiple feeds:
def self.update_all
feed_urls = Feed.pluck :feed_url
feeds = Feedjira::Feed.fetch_and_parse(feed_urls)
feed_urls.each do |feed_url|
feed = Feed.find_by_feed_url(feed_url)
add_entries(feeds[feed_url].entries, feed)
end
end
pluck
returns all the rows of the specified column(s) (:feed_url in this case) in an array. Equally you could change it to accept an array of names, from which it looks up an array of URLs to pass to feedjira.
Finally, if you wanted a self-looping method, you could include:
def self.update_all_periodically(frequency = 15.minutes)
loop do
update_all_from_feed
sleep frequency.to_i
end
end
Then this:
rails runner -e development 'Feed.update_all_periodically'
won't return until you break the process, and will update all feeds at the default frequency, or that specified as an optional argument.
If you wanted to run the updates asynchronously in your main Rails process, then a background worker such as Sidekiq, Resque or DelayedJob will do the... job. :)
Upvotes: 2
Reputation: 33012
Scheduling the fetching and parsing of al these feeds can be incredibly hard and time consuming, which means you shoud absolutely not do it from inside the Rails app itself. At best, you should do it using an 'offline' script.
You could also simply rely on existing APIs like Superfeedr and its rack middleware.
Upvotes: 0