Reputation: 1746
I have a large database of elements each of which has unique key. Every so often (once a minute) I get a load more items which need to be added to the database but if they are duplicates of something already in the database they are discarded.
My question is - is it better to...:
Cheers, Jack
Upvotes: 4
Views: 321
Reputation: 1194
If you're using MySQL, you have the power of INSERT IGNORE
at your finger tips and that would be the most performant solution. You can execute custom SQL queries using the cursor API directly. (https://docs.djangoproject.com/en/1.9/topics/db/sql/#executing-custom-sql-directly)
If you are using Postgres or some other data-store that does not support INSERT IGNORE
then things are going to be a bit more complicated.
In the case of Postgres, you can use rules to essentially make your own version of INSERT IGNORE
.
It would look something like this:
CREATE RULE "insert_ignore" AS ON INSERT TO "some_table"
WHERE EXISTS (SELECT 1 FROM some_table WHERE pk=NEW.pk) DO INSTEAD NOTHING;
Whatever you do, avoid the "selecting all rows and checking first approach" as the worst-case performance is O(n) in Python and essentially short-circuits any performance advantage afforded by your database since the check is being performed on the app machine (and also eventually memory-bound).
The try/except approach is marginally better than the "select all rows" approach but it still requires constant hand-off to the app server to deal with each conflict, albeit much quicker. Better to make the database do the work.
Upvotes: 1