senderle
senderle

Reputation: 150977

Does Django provide any built-in way to update PostgreSQL autoincrement counters?

I'm migrating a Django site from MySQL to PostgreSQL. The quantity of data isn't huge, so I've taken a very simple approach: I've just used the built-in Django serialize and deserialize routines to create JSON records, and then load them in the new instance, loop over the objects, and save each one to the new database.

This works very nicely, with one hiccup: after loading all the records, I run into an IntegrityError when I try to add new data after loading the old records. The Postgres equivalent of a MySQL autoincrement ID field is a serial field, but the internal counter for serial fields isn't incremented when id values are specified explicitly. As a result, Postgres tries to start numbering records at 1 -- already used -- causing a constraint violation. (This is a known issue in Django, marked wontfix.)

There are quite a few questions and answers related to this, but none of the answers seem to address the issue directly in the context of Django. This answer gives an example of the query you'd need to run to update the counter, but I try to avoid making explicit queries when possible. I could simply delete the ID field before saving and let Postgres do the numbering itself, but there are ForeignKey references that will be broken in that case. And everything else works beautifully!

It would be nice if Django provided a routine for doing this that intelligently handles any edge cases. (This wouldn't fix the bug, but it would allow developers to work around it in a consistent and correct way.) Do we really have to just use a raw query to fix this? It seems so barbaric.

If there's really no such routine, I will simply do something like the below, which directly runs the query suggested in the answer linked above. But in that case, I'd be interested to hear about any potential issues with this approach, or any other information about what I might be doing wrong. For example, should I just modify the records to use UUIDs instead, as this suggests?

Here's the raw approach (edited to reflect a simplified version of what I actually wound up doing). It's pretty close to Pere Picornell's answer, but his looks more robust to me.

table = model._meta.db_table
cur = connection.cursor()
cur.execute(
    "SELECT setval('{}_id_seq', (SELECT max(id) FROM {}))".format(table, table)
)

Upvotes: 6

Views: 910

Answers (2)

Pere Picornell
Pere Picornell

Reputation: 1186

About the debate: my case is a one-time migration, and my decision was to run this function right after I finish each table's migration, although you could call it anytime you suspect integrity could be broken.

    def synchronize_last_sequence(model):
        #  Postgresql aut-increments (called sequences) don't update the 'last_id' value if you manually specify an ID.
        #  This sets the last incremented number to the last id
        sequence_name = model._meta.db_table+"_"+model._meta.pk.name+"_seq"
        with connections['default'].cursor() as cursor:
            cursor.execute(
                "SELECT setval('" + sequence_name + "', (SELECT max(" + model._meta.pk.name + ") FROM " +
                model._meta.db_table + "))"
            )
        print("Last auto-incremental number for sequence "+sequence_name+" synchronized.")

Which I did using the SQL query you proposed in your question. It's been very useful to find your post. Thank you!

It should work with custom PKs but not with multi-field PKs.

Upvotes: 1

schillingt
schillingt

Reputation: 13731

One option is to use natural keys during serialization and deserialization. That way when you insert it into PostgreSQL, it will auto-increment the primary key field and keep everything inline.

The downside to this approach is that you need to have a set of unique fields for each model that don't include the id.

Upvotes: 1

Related Questions