Reputation: 464
I need to populate a database with data from an RSS feed. Is there anyway to ensure that I don't populate the database with duplicate information?
I don't want to compare data in the database to determine if I have duplicate information as thi would be very slow.
Similar to this question How to detect changed and new items in an RSS feed? but the answer is not what I am looking for.
Upvotes: 0
Views: 2573
Reputation: 32972
I believe the title of your question and your description of it do not match :)
If you want to get notified when an RSS feed updates, you'll have to use the PubSubHubbub protocol which is designed for that. It will only work though if the publisher supports it in its feeds. You can also check Superfeedr for all other feeds. (I created Superfeedr!).
Now, if you're wondering how to make sure you do not save the same data twice, the recommanded method is to map the <GUID>
element for RSS, or the <id>
element for Atom in your datastore. It will involve comparing all of these items in a feed with the ones you've previoulsy stored. It should not be too costly for most feeds as they usually don't include hundreds of entries.
Upvotes: 0
Reputation: 3123
You will usually want to use the GUID-Element of an Item to perform duplicate-checks.
If you already know the guid of an item, it was already seen by you.
Upvotes: 2