Adding elements to Django database

Question

I have a large database of elements each of which has unique key. Every so often (once a minute) I get a load more items which need to be added to the database but if they are duplicates of something already in the database they are discarded.

My question is - is it better to...:

Get Django to give me a list (or set) of all of the unique keys and then, before trying to add each new item, check if its key is in the list or,
have a try/except statement around the save call on the new item and reply on Django catching duplicates?

Cheers, Jack

Peter M. Elias · Accepted Answer

If you're using MySQL, you have the power of INSERT IGNORE at your finger tips and that would be the most performant solution. You can execute custom SQL queries using the cursor API directly. (https://docs.djangoproject.com/en/1.9/topics/db/sql/#executing-custom-sql-directly)

If you are using Postgres or some other data-store that does not support INSERT IGNORE then things are going to be a bit more complicated.

In the case of Postgres, you can use rules to essentially make your own version of INSERT IGNORE.

It would look something like this:

CREATE RULE "insert_ignore" AS ON INSERT TO "some_table"
WHERE EXISTS (SELECT 1 FROM some_table WHERE pk=NEW.pk) DO INSTEAD NOTHING;

Whatever you do, avoid the "selecting all rows and checking first approach" as the worst-case performance is O(n) in Python and essentially short-circuits any performance advantage afforded by your database since the check is being performed on the app machine (and also eventually memory-bound).

The try/except approach is marginally better than the "select all rows" approach but it still requires constant hand-off to the app server to deal with each conflict, albeit much quicker. Better to make the database do the work.

Adding elements to Django database

Answers (1)

Related Questions