mlissner
mlissner

Reputation: 18186

How to set primary key, then convert to autofield?

I have a model called Document, and I want to add a new table, DocumentCluster that sits above it, with a foreign key to Document.

class DocumentCluster(models.Model):
    sub_document = models.ForeignKey(Document)
    ...lots of fields here...

When I add this table using South, I need to fill it in by setting the primary key and the foreign key to the same value.

For example, if I currently have a Document object with a pk of 12, the new DocumentCluster object will have a pk of 12 and a foreign key to Document number 12.

While it may seem strange that we need the DocumentCluster pk values to match the foreign key values there is an important reason. We use the Document pk in our URLs, but after the change the URLs will load a DocumentCluster, not a Document, so we'll need the pk in DocumentCluster to be the same as it was in Document.

Once that's done, I want the PK of the DocumentCluster to be an AutoField, incrementing from the highest value that was migrated.

Can this be done?

Upvotes: 7

Views: 751

Answers (2)

nmgeek
nmgeek

Reputation: 2197

Using south this is much simpler than I thought it would be.

Use the south schemamigration command as usual to build the schema migration that adds the DocumentCluster model.

Then use the datamigration command to build a skeleton for the next migration:

./manage.py datamigration yourapp populate_clusters

Then fill in the forwards method in the resulting python migration file so it looks like:

def forwards(self, orm):
    max_id = -1
    clusters_added = 0                                                                                                         
    for document in orm.Document.objects.all():
        cluster = orm.DocumentCluster()
        cluster.id = document.id
        cluster.sub_document = document
        cluster.save()
        max_id = max(max_id, document.id)
        clusters_added +=1
    if max_id >= clusters_added:
        orm.DocumentCluster.objects.raw("SELECT "\
           "setval(pg_get_serial_sequence('yourapp_documentcluster',"\
                                         "'id'), %d)" % max_id+1)

(The reverse method in this datamigration would simply delete all the DocumentModel instances.)

If this 'populate_clusters' migration step is run right after the migration step that added the DocumentCluster table then the DocumentCluster table is empty and the serial/autokey counter starts at zero. If you never deleted any Documents then the serial/autokey counter value will end up at the next unused value for Document pk values and you don't even have to bump it up as shown in Erwin's answer.

If, however, you have deleted Document instances or if somehow an id value was skipped then you'll need to bump up that serial/autokey counter using SQL. Django provides a way to directly execute raw SQL. (I don't think you can do without using raw SQL).

For safety sake you should check if cluster.id is <= the maximum document.id value seen in the loop.

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657617

You cannot have a FK from DocumentCluster.pk to Document and at the same time make DocumentCluster.pk a serial column. That's a contradiction. It would have to be two separate columns. I suppose you mean the other way round: from Document to DocumentCluster.

This can be done with SQL DDL commands:

BEGIN;

CREATE TABLE DocumentCluster (pk serial, lots text, of_fields int, ...);

INSERT INTO DocumentCluster (pk, lots, of_fields, ...)
SELECT pk, lots, of_fields, ... FROM Document
ORDER  BY pk; -- optional

ALTER TABLE DocumentCluster ADD CONSTRAINT DocumentCluster_pkey
PRIMARY KEY (pk);  -- could be integrated in CREATE TABLE

ALTER TABLE Document  -- and not the other way round!
ADD   CONSTRAINT pk_fk FOREIGN KEY (pk) REFERENCES DocumentCluster(pk);

SELECT setval(pg_get_serial_sequence('DocumentCluster', 'pk'), max(pk))
FROM   DocumentCluster;  -- update SEQUENCE to highest value

COMMIT;
  • Your obfuscation layer (Django) may be using double-quoted names like "Document", in which case you'd have to do the same ...

  • About setval().

  • About pg_get_serial_sequence()

  • The PK could be declared in the CREATE TABLE statement, but it's cheaper to add it after rows have been inserted - assuming pk is unique not null.

More explanation and links:

Upvotes: 1

Related Questions