Chris Burton
Chris Burton

Reputation: 1205

Trying to get INSERT to only insert new data

I have an RSS feed from Readability that I'm using to keep a record of articles that I've read. I'm grabbing the titles and URL's and inserting them into the database for my own use.

However, my INSERT seems to be taking the entire feed and trying to reinsert it every time which is causing a duplicate error (see here). Now, I know I can remove that error by using INSERT IGNORE but is there another way to go about this?

Possibly by doing something like this:

Check DB for last entry => Compare last entry to array data => INSERT what isn't there into DB.

Upvotes: 0

Views: 107

Answers (2)

cwallenpoole
cwallenpoole

Reputation: 82028

There is no shame in INSERT IGNORE. Use it an be merry! (Seriously, data integrity logic you have to manually handle yourself is annoying and more error prone).

Most SQL dialects have some concept of merging data, and this just happens to be the way that MySQL handles it. This means that not only will INSERT IGNORE be a fast and easy way of handling data, it will also have the novelty of being good practice.

Your other problem is that RSS doesn't really help in any other shortcut. I really like @AaronMiller's suggestion, but the pubDate element is optional, meaning that unless you have complete control over the RSS (and I would guess that you don't, based on the fact that you're worried about storing it), you won't be able to rely on it being there.


For that matter, the only data which is guaranteed to be a part of an RSS item is the description. There is no guarantee that at a future date the feed may change and drop the title or the link elements. If that is not a guarantee, then it might be a good idea to use INSERT IGNORE and pair it with some sort of hash to boot.

Upvotes: 1

Aaron Miller
Aaron Miller

Reputation: 3780

You've got the right idea, sure; you could either get the most recent datetime from the database and only insert items newer than that, or (if you want to be really complete) get everything from the database, compare against everything in the feed, and only insert items which don't match something already in the database. But if you really want INSERT only to insert new data, as implied in your question title, then INSERT IGNORE is the way to go, and doubtless the simplest implementation as well. Unless you've got a concern about the amount of traffic on the database, I'd stick with it.

Upvotes: 1

Related Questions